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1.0  ABSTRACT 


Of  interest  to  the  U.S.  Air  Foree  is  the  ability  to  develop  and  eharaeterize  the  level  of 
workload  that  operators  are  under  at  any  given  point.  When  an  operator’s  eognitive  resources 
exceed  demands,  a  ‘red  line’  of  performance  may  be  crossed  after  which  performance  breaks 
down.  What  is  needed  is  an  estimate  of  operator  state;  a  ‘dipstick’  for  the  operator  in  order  to 
assess  the  level  of  ‘resources’  available,  in  order  to  avoid  performance  problems.  Traditional 
approaches  use  secondary  tasks  (e.g.,  mental  arithmetic)  or  secondary  physiological  measures 
(e.g.,  heart  rate  variability)  for  state  assessment.  However,  the  current  work  was  motivated  by 
dynamic  systems  theory  which  indicates  that  there  are  meaningful  patterns  of  variability  in 
‘primary’  behaviors  (e.g.,  required  activities)  which  might  provide  a  measure  of  operator  state. 
The  present  work  uses  eye  gaze  as  a  primary  measure  in  a  visual  puzzle  task.  The  link  between 
eye  gaze  and  attention  is  generally  accepted  as  is  the  link  between  attention  and  performance 
outcomes.  The  goal  of  Experiment  1  was  to  determine  if  performance  changes  in  a  visual  puzzle 
task  were  reflected  in  eye  gaze,  as  measured  in  multiple  ways:  Conventional  (e.g.,  average 
fixation  length)  &  dynamic  (e.g.,  P  values,  measures  derived  from  a  recurrence  matrix).  These 
relationships  were  explored  in  relation  to  task  difficulty,  time  on  task,  as  well  as  spare  capacity. 
The  results  of  Experiment  1  suggest  that  there  are  impacts  of  task  demands  on  gaze  patterns,  for 
both  conventional  and  dynamic  gaze  metrics.  There  were  also  significant  of  practice  on  eye  gaze 
patterns  in  Experiment  1  that  could  be  interpreted  as  learning  or  strategy  shifts.  The  impact  of 
learning  on  eye  gaze  was  explored  in  a  follow  up  experiment.  The  results  of  Experiment  2  show 
a  significant  improvement  in  performance  in  the  task  accompanied  by  change  in  gaze  patterns 
when  repeating  the  same  puzzle;  and  that  the  dynamic  measure  of  diagonal  recurrence  was 
systematically  related  to  this  performance  change.  This  suggests  that  non-conventional  measures 
of  dynamic  structure  provide  additional  &  complimentary  information  about  operator  state. 
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2.0  INTRODUCTION 


The  nature  of  military  operations  is  often  one  of  high  complexity  and  high  demand  on  the 
operators.  Of  interest  to  the  U.S.  Air  Force  is  the  ability  to  develop  and  characterize  the  level  of 
workload  that  operators  are  under  at  any  given  point.  The  issue  is  one  of  overall  performance; 
Successful  performance  requires  a  balance  between  available  resources  or  capacity  of  the 
operators,  and  expected  demands  in  order  to  maintain  desirable  levels  of  performance.  Periods 
of  high  workload  are  to  be  expected,  and  therefore  some  spare  capacity  of  the  operator  is 
desirable  to  deal  with  unexpected  events.  Additionally,  sustained  periods  of  high  workload  are 
likely  to  result  in  negative  performance  outcomes.  A  conceptual  diagram  of  one  type  (the  Cusp 
Catastrophe  model,  Gustello  et  al,  201 1)  of  interaction  of  resource  availability  task  demands,  and 
performance  is  depicted  in  Figure  1 . 


Figure  1.  A  conceptual  diagram  of  the  red  line  for  workload  and  performance.  Y  axis  represents  a  generic 
increase  in  all  variables.  The  x-axis  represents  a  passage  of  time.  Performance  may  stay  steady  as  resources 
are  depleted  (dotted  line)  with  increasing  demands  (dashed  line),  but  at  some  point  a  red  line  will  be  crossed 
after  which  performance  decreases  below  acceptable  levels  (falls  outside  of  blue  boundaries). 


Conceptually,  operators  have  limited  resources  (e.g.,  perceptual  limitations,  processing 
limitations)  to  deal  with  their  tasks,  but  will  manage  well  most  of  the  time.  Flowever,  as 
diagramed  in  Figure  1,  a  combination  of  limited  resources  (dotted  line)  and  increasing  demands 
(dashed  line)  can  create  a  situation  in  which  performance  drops  (grey  &  black  lines)  outside  the 
range  of  acceptable  performance  (blue  lines).  As  resources  become  strained,  performance  can 
often  be  maintained  for  some  indefinite  period  of  time,  but  eventually  a  qualitative  breakdown  in 
performance  outcomes  (e.g.,  mission  failure)  will  occur.  This  point  after  which  a  breakdown  in 
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performance  is  inevitable  can  be  characterized  as  a  ‘red  line’  (Grier  et  ah,  2008).  Avoiding  the 
‘red  line’  is  critical;  typical  military  tasks  are  in  domains  in  which  performance  failures  are  at  a 
minimum  undesired  (e.g.,  transportation  delays)  and  potentially  catastrophic  (e.g.,  air  traffic 
control  accident,  loss  of  life  or  critical  equipment).  What  is  needed  is  a  ‘dipstick’  for  the 
operator;  some  way  to  gain  information  about  the  level  of  ‘resources’  available  at  any  given 
point. 


The  issue  is  certainly  multifaceted,  and  there  has  been  a  large  body  of  work  in  this  area 
(e.g.,  Tsang  &  Vidulich,  2006).  However,  the  focus  of  the  present  work  is  not  to  classify  or 
model  the  source(s)  of  workload,  but  rather  to  approach  the  problem  more  generally  in  regards  to 
how  the  state  of  the  operator  might  be  influenced  by  task  demands  in  a  way  that  is  detectable  by 
some  parameter  or  measurement  from  the  operator.  This  could  provide  an  objective  indication 
of  operator  state,  as  opposed  to  a  subjective  indicator  derived  via  questionnaires  (e.g.,  NASA 
Task  Load  Index;  Hart  and  Staveland,  1988).  At  a  minimum,  a  signal  needs  to  be  loosely 
coupled  to  performance  outcomes.  In  order  to  be  useful  from  an  operational  standpoint,  it  also 
needs  to  be  relatively  unobtrusive  to  collect.  Ideally,  this  measurement  would  allow  for  a 
prediction  of  a  future  qualitative  change  in  performance  outcomes. 

The  research  strategy  adopted  by  the  Applied  Neuroscience  Branch  of  the  Air  Force  is 
the  Sense-Assess- Augment  framework  (Parasuraman  &  Galster,  2013).  First,  provide  adequate 
sensor  capability  to  measure  the  appropriate  phenomena  or  parameters  to  detect  the  underlying 
state  (Sense);  analyze  the  data  in  such  a  way  as  to  gain  insight  into  the  underlying  state  of  the 
operator  in  relation  to  performance  (Assess);  and  finally  provide  corrective  action  or  intervention 
if  needed  (Augment).  The  general  goal  is  to  find  a  signal  which  is  ‘loosely  coupled’  to 
performance:  For  predictive  purposes,  quantitative  changes  in  the  signal  should  be  evident  even 
if  overall  performance  is  remaining  constant.  Prior  to  the  red  line,  a  critical  value  in  the  signal 
should  readily  identify  an  upcoming  qualitative  performance  change.  For  the  present  work,  the 
term  operator  state  assessment  will  be  used  to  represent  this  idea;  to  measure  a  parameter  or 
signal  from  the  operator  which  relates  the  availability  of  ‘resources’  in  order  to  predict 
performance. 

A  common  approach  to  assessment  is  the  addition  of  a  secondary  task  (e.g.,  mental 
arithmetic,  tracking  tasks,  etc.)  to  the  primary  task  of  interest.  A  dual-task  paradigm  allows  for 
measurement  of  performance  for  both  primary  &  secondary  tasks  and  by  manipulating  the 
difficulty  of  one  of  the  tasks,  changes  in  the  other  can  be  used  to  estimate  levels  of  spare 
capacity.  While  this  method  has  been  shown  to  be  effective  in  laboratory  settings,  (e.g.,  Ogden 
et  al,  1979;  O’Donnell  &  Eggemeier,  1986)  the  ability  to  make  assessments  of  operator  state 
comes  at  the  cost  of  adding  more  work  for  the  operator,  which  is  undesirable  in  typical 
operational  settings. 

Physiological  signals  represent  another  type  of  measurement  that  has  been  hypothesized 
to  reflect  to  the  state  of  the  operator,  and  multiple  physiological  signals  have  been  studied.  A 
short  list,  certainly  not  all  inclusive,  includes  heart  rate  variability  (HRV;  reviewed  by  Joma, 
1992),  brain  activity  as  measured  by  electro  encephalogram  (EEG;  Wilson,  2002),  and  cerebral 
blood  flow  velocity  (reviewed  by  Warm,  Parasuraman,  &  Matthews,  2008).  Each  has  been 
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shown  to  be  related  with  performanee  outeomes  in  some  way  (e.g.,  vigilance  decrement  and 
blood  flow  velocity),  but  these  relationships  are  not  definitive.  Drawbacks  in  regards  to  lack  of 
sensitivity  to  workload  changes  (HRV),  signal/noise  problems  (EEG),  and  intrusiveness  or 
feasibility  of  implementation  (cerebral  blood  flow)  have  limited  the  overall  success  in  both 
laboratory  and  operational  settings.  With  additional  research  and  technological  innovation  these 
limitations  may  be  overcome;  however  at  present  research  in  the  field  of  complexity  and 
nonlinear  dynamics  may  provide  an  alternative  way  to  assess  the  state  of  the  operator  from 
primary  measures  of  behavior,  rather  than  ‘secondary’  physiological  measures  or  tasks. 

Consider  ‘raw  performance’  diagrammed  in  Eigure  1  (grey  line).  Mean  performance 
(black  line)  may  be  stable,  but  there  will  be  variability  in  performance.  Assumptions  of  central 
tendency  consider  this  variability  as  error  (i.e.  variability  carries  little  information  about  the 
source).  However,  measures  of  variability  in  a  wide  variety  of  natural  and  manmade  phenomena 
(e.g.,  forest  fires,  avalanches,  water  levels  in  lakes,  traffic  patterns  on  the  road,  traffic  on 
telephone  lines;  Jensen,  1998;  Newman,  2005)  indicate  that  there  are  specific  patterns  of 
variability  in  ‘primary’  measures  of  phenomena  that  represent  underlying  states  of  the  overall 
system  (e.g.,  day  to  day  variability  in  water  levels  provides  insight  into  the  overall  properties  of 
the  lake,  such  as  drought  conditions).  Research  in  dynamic  systems  suggests  that  variability  is 
not  necessarily  random;  in  the  examples  mentioned  above  there  are  meaningful,  complex 
patterns  in  behavior  which  are  often  revealed  by  time  series  analyses  (a  time  series  is  the  time 
ordered  series  of  repeated  measurements  for  an  entire  data  collection  epoch).  Key  to  the  issue  of 
state  assessment  is  that  variability  patterns  measured  in  a  primary  signal  (e.g.,  a  primary  task 
performance  activity)  can  reflect  the  qualitative  state  of  the  system  as  a  whole  (such  as 
approaching  the  red  line). 

Erom  a  dynamical  systems  perspective,  the  assumption  is  that  any  type  of  complex 
system  will  have  interactions  between  underlying  components  and  processes  that  will  influence 
the  measured  outcome  (e.g.,  Takens  1981).  The  effects  of  these  interactions  only  become 
apparent  when  data  is  observed  across  time  (rather  than  collapsed  in  time  as  with  an  average).  In 
general  terms  from  complexity  theory,  dynamic  systems  exhibit  a  variable,  yet  globally  stable 
‘macrostructure’  (e.g.,  performance  or  behavior)  coupled  to  a  highly  variable  ‘microstructure’ 
(e.g.,  components  or  processes)  (Kelso,  2005,  Kloos  and  Van  Orden,  2010).  Note  that 
complexity  theory  is  somewhat  agnostic  to  what  the  components  are;  analyzing  data  across  time 
often  reveals  properties  of  the  coupling  and  interactions  between  components  and  processes 
without  identification  of  the  components  themselves. 

Motivated  by  these  broader  patterns  in  nature  (e.g.,  self-organization  and  spontaneous 
order;  Kugler,  Kelso  &  Turvey,  1982),  Kelso  demonstrated  that  qualitative  ‘phase  shifts’  in 
performance  can  be  measured  by  quantitative  analysis  of  variability  patterns  over  time.  Kelso 
demonstrated  these  complex  phase-shift  relationships  with  a  model  system;  finger  tapping. 
Participants  were  asked  to  move  both  their  left  and  right  index  fingers  with  a  metronome. 
Participants  tended  to  exhibit  one  of  two  stable  tapping  states  between  their  fingers:  Either  in- 
phase  (both  index  fingers  ‘up’  then  both  ‘down’)  or  anti-phase  (one  finger  up,  the  other  down). 
Participants  were  allowed  to  move  their  fingers  in  whichever  orientation  was  ‘comfortable’.  As 
the  metronome  speed  was  increased,  fluctuations,  or  phase  shifts,  between  the  two  patterns  began 
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to  occur.  Each  phase  shift  was  preceded  by  spikes  in  variability  (eritieal  fluetuations),  or  a 
regularity  or  periodieity  (eritieal  slowing  down)  in  the  variability  patterns  of  the  primary  time 
series  (Kelso,  1995). 

Kelso’s  body  of  work  on  phase  transitions  has  motivated  and  informed  other  areas  of 
human  performance.  For  example,  qualitative  shifts  in  movement  (e.g.,  from  walking  to 
running),  can  be  measured  by  the  variability  patterns  in  the  coordination  of  limbs  (Harrison  & 
Richardson,  2009).  When  two  individuals  are  “harnessed”  together,  a  qualitative  shift  into 
organized  quadrupedal  movement  between  the  two  individuals  is  established,  as  quantified  by  a 
ehange  in  variability  in  the  limb  movements  between  the  two  individuals  (Harrison  & 
Riehardson,  2009).  Crites  and  Gorman  (2013)  report  different  patterns  of  variability  in  novel  vs. 
existing  skill  aequisition.  In  addition  to  motor  eontrol  researeh.  Van  Orden  et  al  (2005)  show 
that  primary  measures  of  reaetion  time  exhibit  speeifie  patterns  of  variability,  whieh  is  thought  to 
be  inherent  to  normal  eognitive  performance.  Taken  together,  there  is  evidenee  suggesting  that 
eritieal  patterns  of  variability  in  primary  measures  ean  deseribe  qualitative  shifts  in  behavior,  and 
furthermore  that  ehanges  in  variability  patterns  may  preeede  these  shifts.  If  future  qualitative 
shifts  in  operator  state  ean  quantified  by  patterns  of  variability  exhibited  in  the  behavior  itself  it 
may  provide  an  alternative  approaeh  for  state  assessment. 

2.1  Dynamic  Approaches  to  Assessment 

Regardless  of  the  ehoiee  of  signal,  an  important  analytieal  question  is  how  to  quantify  the 
signal  in  a  way  that  represents  the  state  of  the  operator  in  a  meaningful  way.  As  previously 
mentioned,  conventional  approaehes  to  this  problem  quantify  signals  in  some  type  of  average 
value  (e.g.,  average  HRV  in  a  frequeney  band  (Joma,  1992);  average  EEG  aetivity  (Wilson, 
2002)).  Certainly  measuring  average  values  will  be  important  information  for  state  assessment 
(or  any  type  of  data  analysis),  but  given  the  potential  benefit  of  time  series  analyses  it  makes 
sense  to  also  measure  patterns  over  time. 

The  following  examples  are  methods  for  analyzing  data  via  time  series  analysis,  and  are 
presented  as  demonstrations  of  their  respeetive  types  of  variability,  or  dynamie  structure.  It  is 
generally  expected  that  patterns  of  behavior  emerge  and  change  over  the  eourse  of  learning  and 
experienee  (Warren,  2006;  Davids  et  al,  2008)  and  are  eonstrained  by  both  intrinsie  (internal) 
and  extrinsic  (task)  dynamics  (Holden,  Choi,  Amazeen,  &  Van  Orden,  2011;  Kloos  and  Van 
Orden,  2010;  Kelso,  1995).  In  other  words,  by  manipulating  external  constraints  in  an 
experimental  eontext,  ehanges  to  internal  eonstraints  are  likely  to  result,  and  these  ehanges  are 
likely  to  be  measured  by  time  series  analyses  of  the  signal.  For  the  present  work,  analyses  in 
both  the  frequency  and  time  domains  were  used  in  order  to  leverage  multiple  measures  of 
dynamic  structure. 

2.1.1  Frequency  Measures  of  Dynamic  Structure 

Frequency  analyses  assess  the  level  of  dynamic  structure  based  on  the  amount  of 
randomness  vs.  dependence  that  is  present  in  the  data.  Frequency  analyses,  specifically  power 
spectral  density  (PSD)  correlations  of  frequency  to  absolute  power,  as  computed  through  the  Fast 
Fourier  Transform  (FFT),  make  distinctions  about  the  level  of  randomness  and  structure  in  a 
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time  series.  When  the  PSD  output  is  eonverted  to  logarithmic  scales,  a  regression  fit  is 
computed.  The  slope  of  the  regression  equation  is  a  measure  of  the  relationship  between  the 
frequency  and  power  exhibited  by  the  time  series,  which  indicates  the  level  of  persistence 
observed  in  the  time  series.  Persistence  can  be  thought  of  as  the  degree  to  which  values  depend 
on  previous  values  (i.e.  dependence).  For  complex  systems,  the  regression  relationship  is  a 
power  law  fit.  The  slope  values  reported  are  referred  to  as  scaling  exponents,  or  p  values  (Eke  et 
ah,  2002). 

Slopes  (P  values)  calculated  at  or  near  zero  are  indicative  of  random  processes,  or  white 
noise  processes,  in  which  all  observed  frequencies  have  equal  power,  as  shown  in  Figure  2.  As 
the  frequency  to  power  relationship  inverts,  such  that  lower  frequencies  show  proportionally 
higher  power,  negative  slope  values  are  observed.  NegativeP  values  between  -.5  to  -1.5,  are 
indicative  of  a  specific  type  of  persistence  called  pink  noise  or  Hf  noise,  shown  in  Figure  3. 
Rather  than  all  frequencies  exhibiting  equal  power,  for  Mf  noise  power  and  frequency  are 
inversely  related  such  that  lower  frequencies  show  greater  power  and  vice  versa.  Figure  4 
depicts  a  time  series  with  even  greater  dependence,  as  indicated  by  P  values  between  -1.5  to  -2.5 
which  are  often  referred  to  as  brown  noise.  Most  time  series  of  human  phenomena  exhibit  P 
values  which  can  be  described  as  fitting  one  of  these  three  categories  (white  noise,  1// noise, 
brown  noise).  Note  that  in  all  cases  presented  here,  the  mean  value  for  the  time  series  is  zero: 
The  obvious  qualitative  differences  between  the  examples  are  revealed  by  time  series  analysis,  as 
opposed  to  averages. 


Figure  2.  A  randomly  generated  white  noise  time  series  (left)  and  Power  Spectral  Density  Output  (right),  p  = 
0  indicates  no  correlation  among  frequency  (y  axis)  and  power  (x  axis).  Note  that  the  time  series  has  a  mean 
of  zero  and  a  standard  deviation  of  1. 
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Figure  3.  A  randomly  generated  pink  noise  time  series  (left)  and  Power  Spectral  Density  Output  (right),  p  =  - 
1  indicates  inverse  \lf  correlation  among  frequency  (y  axis)  and  power  (x  axis).  Note  that  the  time  series  has  a 
mean  of  zero  and  a  standard  deviation  of  1. 


Figure  4.  A  randomly  generated  hrown  noise  time  series  (left)  and  Power  Spectral  Density  Output  (right),  p 
=  -2  indicates  large  inverse  correlation  among  frequency  (y  axis)  and  power  (x  axis).  Note  that  the  time 
series  has  a  mean  of  zero  and  a  standard  deviation  of  1. 


The  three  examples  ean  also  be  defined  in  terms  of  eonstraints.  A  system  that  is 
eompletely  unconstrained  will  exhibit  white  noise  properties.  Alternatively,  brown  noise 
systems  are  highly  constrained  and  mechanical.  In  the  middle,  Hf  systems  exhibit  a  loose 
coupling  that  has  been  reported  as  a  characteristic  in  a  variety  of  dynamic  systems  (Newman, 
2005).  This  Mf  noise  has  been  described  as  a  hallmark  of  systems  that  are  interaction  dominant; 
it  represents  a  ‘meta-stable’  property  of  systems  that  are  variable  (but  not  random)  and  coupled 
(but  not  mechanical)  (Jensen,  1998;  Van  Orden  et  al,  2005). 

2.1.2  Time  Based  Measures  of  Dynamic  Structure 

In  addition  to  frequency  domain  analyses,  time  domain  methods  exist  to  further  explore 
the  levels  of  dynamic  structure  exhibited  by  complex  systems.  Recurrence  Quantification 
Analysis  (RQA)  is  one  such  method  of  determining  the  degree  of  patterning  and  dynamic 
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structure  in  a  time  series.  Essentially,  anNx  N matrix  plot  (where N  is  the  time  series  length;  the 
simplest  method  plots  a  time  series  against  itself)  is  generated.  As  depicted  in  Figure  5a  and  b, 
any  shaded  area  represents  a  “match”  or  recurrent  point.  The  ratio  and  loeations  of  these 
recurrent  points  provide  the  basie  units  of  analysis  in  this  method.  The  first  of  these  metrics  is 
percent  reeurrence  (%REC)  whieh  is  the  ratio  of  reeurrent  points,  to  all  possible  points.  Pereent 
recurrence  represents  the  proportion  of  “states”  that  repeat  or  recur  across  the  time  series.  A 
second  measure,  percent  determinism  (%DET),  is  the  pereentage  of  reeurrent  states  that  repeat  in 
the  same  order  eaeh  time;  deterministie  points  appear  as  diagonal  line  structures  in  the  matrix. 
Note  the  large  diagonal  in  the  center  whieh  splits  the  plot  into  two  identieal  halves.  For,  RQA 
the  plot  is  one  to  one  on  the  time  series  to  itself  (i.e.,  the  diagonal  is  not  meaningful;  a  time  series 
will  always  be  identieal  with  itself  along  the  center  diagonal)  and  only  half  of  the  plot  is  used  for 
computation. 

Similar  to  the  previous  frequency  analysis  examples,  RQA  can  describe  the 
characteristics  of  the  system  that  produced  the  time  series.  Webber  and  Zbilut  (2005)  note  that 
an  unconstrained  or  white  noise  (e.g.,  random  process;  Figure  5a)  system  will  show  random 
levels  of  recurrence  &  determinism  that  are  at  ehanee  levels.  Highly  constrained  systems  (e.g.,  a 
sine  wave;  Figure  5b)  will  produce  very  high  values  for  %REC  and  %DET  as  the  system  repeats 
the  same  patterns  in  the  same  order.  Between  these  two  extremes,  loosely  eonstrained  systems 
will  show  moderate  patterning;  they  exhibit  greater  than  ehanee  levels  of  recurrence  and 
determinism,  but  not  at  extreme  levels  that  would  be  seen  in  highly  mechanical  systems. 


Figure  5.  a.)  A  random  process  plotted  against  itself.  Shaded  areas  represent  recurrent  points;  which  occur 
as  a  matter  of  chance,  as  do  diagonal  line  structures,  h.)  A  sine  wave  plotted  against  itself.  Shaded  areas 
represent  recurrent  points,  which  always  occur  in  the  same  period  as  the  sine  wave  itself;  nearly  all  points  fall 
on  a  diagonal  line  structure. 
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A  standard  RQA  provides  an  estimate  of  dynamic  structure  in  a  system  using  a  single 
variable;  however  the  mathematics  are  equally  able  to  provide  estimates  of  structure  and 
coupling  between  two  variables  (or  systems).  In  this  method,  Cross  Recurrence  Quantification 
Analysis  (CRQA;  Weber  &  Zbilut,  2005),  the  same  metrics  from  a  standard  RQA  are  computed, 
but  for  a  matrix  that  compares  two  different  time  series  (e.g.,  an  Ni  x  N2  matrix),  as  shown  in 
Figure  6  and  Figure  7.  Rather  than  define  self-similar  patterns  of  dynamic  structure  (RQA), 
higher  levels  of  cross  recurrence  (%CREC)  indicate  similarity  between  the  two  time  series  (e.g., 
when  there  is  a  dot  in  the  matrix  the  two  time  series  shared  the  same  value)  and  %CDET  is  a 
general  indicator  of  coupling  between  the  two  time  series  (still  visible  as  diagonal  lines  in  the 
matrix). 

CRQA  provides  a  third  way  to  further  quantify  the  level  of  coupling  between  two  time 
series.  Whereas  a  standard  RQA  has  a  diagonal  that  is  not  meaningful  at  a  time  lag  of  zero,  a 
diagonal  line  at  lag  zero  in  a  CRQA  is  a  further  indication  of  the  level  of  synchronized  coupling 
of  the  two  time  series  (Dale,  2011).  Analysis  of  the  Diagonal  Recurrence  Profile  (DRP)  is 
similar  to  an  autocorrelation  function.  The  diagonal  recurrence  profile  computes  the  percentage 
of  values  that  recur  along  different  levels  of  “lag”.  Eag  0  is  computed  along  the  diagonal  (e.g., 
do  the  two  time  series  have  the  same  value  at  the  same  time).  A  lag  of  1  would  compute  the 
proportion  at  +/-  1  measurement  in  the  time  series  from  time  zero  and  so  on  (e.g.,  a  state  that 
occurs  at  time  x  in  Ni  recurs  at  time  x  +  1  in  N2).  As  shown  in  Eigure  6,  higher  levels  of  diagonal 
recurrence  (%DREC)  along  a  lag  of  zero  indicate  a  high  level  of  synchronicity  between  the  two 
time  series.  Eigure  7  shows  a  cross  recurrence  matrix  for  two  times  series  that  exhibit  low  levels 
of  similarity  and  coupling.  Time  series  that  are  not  strongly  coupled  will  show  low  levels  of 
%DREC  at  all  lag  values.  Although  the  present  work  will  focus  on  a  %DREC  at  a  lag  of  zero,  it 
should  be  noted  that  high  %DREC  at  lag  values  other  than  zero  could  be  indicators  of  coupling 
between  the  time  series  in  a  leader/follower  relationship  (Richardson  &  Dale,  2005). 
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Series  2 


Figure  6.  An  example  cross  recurrence  plot  for  two  time  series:  Series  1  (Y-Axis)  and  Series  2  (X-Axis). 
Shaded  grey  areas  represent  matching  values  between  the  two  series  (recurrence).  Line  structures  (an 
example  is  circled  in  red)  represent  matching  values  in  an  order  (determinism).  Diagonal  Recurrence 
appears  as  a  line  structure  along  the  diagonal.  The  high  level  of  diagonal  recurrence  presented  in  this  figure 
indicates  high  (hut  not  total)  coupling  between  the  two  time  series. 


10 

Distribution  A:  Approved  for  public  release;  distribution  unlimited. 
88  ABW  Cleared  09/08/2014;  88ABW-2014-4229. 


Figure  7.  An  example  cross  recurrence  plot  for  two  time  series:  Series  1  (Y-Axis)  and  Series  2  (X-Axis). 
Shaded  grey  areas  represent  matching  values  between  the  two  series  (recurrence).  Line  structures  (an 
example  is  circled  in  red)  appear  representing  values  that  recur  in  order  (determinism).  This  plot  shows  low 
levels  of  diagonal  recurrence  which  indicates  low  coupling  between  the  two  time  series. 


2,2  Eye  Gaze:  Dynamic  Measures 

Eye  gaze  has  been  shown  to  be  important  even  in  commonplace,  everyday  tasks  (e.g., 
making  tea,  making  a  sandwich;  Land  &  Hayhoe,  2001).  The  visual  aspect  of  many  current 
military  operations  (e.g.,  RPA  operators,  threat  detection  in  surveillance  video/images,  cyber 
operations),  lead  to  an  expectation  that  eye  gaze  is  relevant  to  operator  performance  via  the 
generally  accepted  links  between  eye  gaze  and  attention,  and  the  further  link  to  attention  and 
performance  (Galster  &  Parasuraman,  2013). 

Although  the  link  between  vision  and  attention  is  not  absolute,  (i.e.,  attention  can  be 
shifted  around  the  visual  field  (Heinen  et  al,  2011)),  typical  operational  settings  described  above 
require  attention  to  small  details  (e.g.,  requiring  fixations  on  the  fovea).  Given  this  constraint, 
eye  gaze  may  very  well  serve  as  a  primary  measure  of  performance.  This  is  not  in  and  of  itself  a 
novel  idea;  the  work  domains  may  have  changed,  but  the  link  between  eye  gaze  and  attention 
isn’t  new.  Eye  gaze  has  been  theoretically  linked  to  attention  and  cognition  via  the  early 
foundational  work  in  eye  gaze  measurement  (Yarbus,  1967),  other  early  work  in  instrument 
sampling  in  aviation  (Carbonell  et  al,  1968),  the  ‘spotlight’  metaphor  for  eye  gaze  and  attention 
(e.g.,  Posner  et  al,  1980),  to  more  recent  applications  of  eye  gaze  in  reading  (reviewed  by 
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Rayner,  1998),  and  general  work  regarding  eye  movements  (Kowler,  2011).  While  the  interest 
in  eye  gaze  and  the  links  to  attention  are  not  new  topics,  the  capability  to  readily  measure  and 
record  eye  movements  unobtrusively  and  in  operation  settings  is  a  more  recent  capability  that 
could  be  implemented  for  purposes  of  state  assessment  (Duchowski,  2002). 

In  addition  to  the  previous  examples  linking  eye  gaze  to  performance,  eye  gaze  measures 
have  been  linked  to  operator  workload.  May  et  al  (1990)  report  a  decrease  in  the  number  and 
range  of  eye  movements  during  free  view  when  participants  performed  a  secondary  counting 
task.  The  range  showed  further  reduction  as  secondary  task  difficulty  was  increased.  In  a  more 
applied  setting,  driving,  a  narrowing  of  visual  attention,  or  “tunnel  vision”,  has  been  observed 
under  high  workload  (e.g.,  Reimer,  2009).  Tunnel  vision  is  often  accompanied  by  an  increase  in 
the  number  of  fixations,  and  a  corresponding  decrease  in  the  length  of  fixation.  It  would  then  be 
expected  that  by  manipulating  task  difficulty  in  an  experiment,  that  changes  in  gaze  patterns  will 
likely  result. 

Yarbus’  (1967)  work  on  eye  gaze  patterns  in  complex  scene  viewing  provides  further 
foundation  for  the  expectation  that  simple  changes  in  experimental  context  can  produce  vast 
differences  in  gaze  patterns.  Yarbus  was  one  of,  if  not  the  first,  to  measure  gaze  patterns  using 
an  eye  tracking  apparatus.  Yarbus  showed  participants  a  series  of  images,  while  tracking  eye 
gaze.  Yarbus  provided  different  questions  about  the  image  for  participants  to  ‘keep  in  mind’ 
while  viewing  the  images.  A  sample  image,  “The  Unexpected  Visitor”,  is  depicted  in  Figure  8 
illustration  adapted  from  Yarbus,  1967;  figure  from  Land  &  Tatler,  2009). 
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Figure  8.  Eye  gaze  traces  from  Yarbus  (1967).  Each  represents  data  for  one  participant  examining  a  picture 
(The  Unexpected  Visitor)  with  different  questions  in  mind,  (a)  Free  examination,  (b)  Estimate  the  material 
circumstances  of  the  family  in  the  picture,  (c)  Give  the  ages  of  the  people,  (d)  Surmise  what  the  family  had 
been  doing  before  the  arrival  of  the  ‘unexpected  visitor’,  (e)  Remember  the  clothes  worn  by  the  people,  (f) 
Remember  the  position  of  the  people  and  objects  in  the  room,  (g)  Estimate  how  long  the  unexpected  visitor 
had  been  away  from  the  family. 


By  asking  different  questions,  sueh  as  “Estimate  the  material  eircumstances  of  the  family 
in  the  picture”  (Figure  8b)  or  “Give  the  ages  of  the  people”  (Figure  8c),  participants  gaze 
patterns  were  clearly  different,  based  on  their  qualitative  patterns.  When  asked  about  wealth, 
participants  scanned  objects  in  the  image,  when  asked  about  ages  of  people  participants  looked  at 
faces.  While  this  discrepancy  in  scan  patterns  may  seem  obvious,  the  potential  ability  to 
quantify  these  types  of  qualitative  changes  in  gaze  pattern  provides  a  potentially  informative  way 
to  measure  operator  state. 
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Again,  conventional  approaches  to  quantifying  eye  movements  in  tasks  that  involve 
active  participation  of  the  participant  (e.g.,  active  tasks)  include  average  fixation  length  or 
average  movement  velocity  (e.g.,  May  et  al,  1990;  Hayhoe  et  al,  1998;  Kowler,  2011).  As  has 
been  stated,  this  type  of  approach  likely  misses  potentially  informative  information  from 
variability  patterns  in  eye  gaze  time  series. 

Initial  research  using  time  history  analyses  (utilizing  measures  of  dynamic  structure)  has 
been  conducted  by  Aks  et  al.  (2002).  Similar  to  other  complex  systems,  visual  search  involves 
many  interacting  processes  and  components,  including  the  influences  of  the  experimental  task, 
leading  Aks  et  al.  to  hypothesize  that  eye  gaze  time  series  would  exhibit  dynamic  structure  in  a 
visual  search  task.  The  task  used  was  searching  for  a  target  (uppercase  T)  among  distracters 
(upper  case  E).  The  results  indicate  that  Euclidian  distance  between  subsequent  measurements 
(XI -X2  and  Y1-Y2  pixel  position)  recorded  in  visual  search  tasks  exhibit  temporal  structure  in 
the  range  of  brown  noise  (P  ~  -2).  This  initially  suggested  a  high  level  of  dependence  between 
fixations.  There  was  some  concern  that  position  data  alone  could  produce  spurious  brown  noise, 
due  to  constraints  that  the  screen  size  imposed  on  the  gaze  time  series.  This  led  the  researchers 
to  further  analyze  an  additional  metric,  angular  change  between  eye  movements.  Angular 
change  measures  the  difference  between  subsequently  tracked  positions  in  angular  units  rather 
than  distance  units.  When  the  raw  gaze  time  series  were  converted  to  angular  changes  between 
positions,  the  analysis  revealed  a  1//’(P  ~  -1)  correlation. 

Stephen  and  Anastas  (2011)  re-analyzed  data  from  an  earlier  publication  (Stephen  and 
Mirman,  2010)  and  confirmed  findings  of  Aks  et  al.  (2002),  in  regards  to  dynamic  structure 
observed  in  eye  movement  time  series.  However,  Stephen  and  Anastas  (2011)  went  a  bit  further, 
by  analyzing  the  relationship  between  dynamic  structure  and  reaction  time  using  growth  curve 
modeling.  The  data  suggests  that  dynamic  structure  for  angular-change  time  series  that  exhibit 
patterns  of  I//" noise  are  related  to  decreases  in  reaction  time;  an  improvement  in  the  performance 
measure  for  the  task. 

Erequency  analyses  provide  a  general  classification  of  eye  gaze  (e.g.,  random  vs. 
structured),  but  this  general  classification  is  likely  complimented  by  more  explicit  measures  of 
coupling  and  similarity  from  time  domain  measures  of  cross  recurrence.  Richardson  and  Dale 
(2005)  used  cross  recurrence  of  eye  gaze  time  series  as  a  way  to  understand  the  coupling 
between  speakers  and  listeners  when  telling  a  story.  Two  participants  had  separate  screens  with 
identical  depictions  of  characters  from  a  popular  television  show.  One  participant  told  a 
predetermined  story  about  an  episode  of  the  television  show  (speaker).  The  listener  had  to 
respond  to  a  series  of  questions  about  this  story.  Both  participants’  gaze  was  tracked  while  the 
story  was  told,  and  was  analyzed  via  cross  recurrence.  Eisteners  whose  gaze  patterns  showed 
higher  coupling  with  gaze  patterns  of  speakers  (as  measured  through  %  Diagonal  Recurrence) 
also  exhibited  better  retention  when  asked  questions  about  the  story.  Eigure  9  depicts  a  sample 
cross  recurrence  plot  for  a  listener/speaker  dyad  as  presented  in  Richardson  and  Dale  (2005)  with 
relatively  strong  coupling  in  their  eye  gaze  patterns. 


14 

Distribution  A:  Approved  for  public  release;  distribution  unlimited. 
88  ABW  Cleared  09/08/2014;  88ABW-2014-4229. 


2000 


1800 

1600 

1400 

^  1200 

c 

CL) 

1000 

800 

600 

400 

200 


500  1000  1500  2000 

Speaker 


Figure  9.  Cross  recurrence  plot  for  one  listener  (Y-Axis)  and  speaker  (X-Axis)  dyad  from  the  experiment 
conducted  by  Richardson  &  Dale,  (2005).  Shaded  grey  areas  represent  the  two  individuals  looking  at  the 
same  location  on  their  respective  screens.  This  pair  shows  a  relatively  high  level  of  diagonal  recurrence, 
indicating  a  high  level  of  time  synchronized  coupling  between  listener  and  speaker. 


3.0  EXPERIMENT  1 

3.1  Introduction 

Overall,  there  is  evidenee  to  suggest  not  only  are  dynamie  patterns  exhibited  by  eye  gaze 
time  series,  the  same  dynamie  patterns  ean  show  relationships  with  some  performanee  outeome 
(e.g.,  reaetion  time,  Stephen  &  Anastas  (2011),  learning  or  eomprehension,  Riehardson  &  Dale 
(2005)).  Combined  with  general  findings  relating  ehanges  in  eye  gaze  under  low  and  high 
workload,  there  is  potential  for  time  series  analyses  to  eategorize  dynamie  patterns  of  variability 
in  eye  gaze  that  is  potentially  informative  for  operator  state  assessment.  This  projeet  is  an 
exploration  of  this  idea;  the  goal  is  to  learn  if  additional  information  about  operator  state  ean  be 
gained  by  dynamie  measures  of  eye  gaze  when  task  demands  are  manipulated  in  an  experimental 
eontext. 

In  the  eurrent  projeet,  it  was  expeeted  that  partieipants’  gaze  patterns  would  exhibit 
dynamie  strueture,  as  measured  via  time  series  analyses.  Changes  in  dynamie  strueture  observed 
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in  eye  movement  time  series  are  likely  indicative  of  the  underlying  organizational  and  structural 
changes  within  the  cognitive  and  visual  systems.  Both  frequency  and  time  based  measures  of 
dynamic  structure  were  tested.  These  alternative  indices  were  expected  to  provide  additional 
information  when  compared  to  conventional  (average  based)  measures  of  eye  gaze  behavior 
(e.g.,  average  fixation  time).  As  task  demands  shift,  and  participants  adapt,  qualitative  gaze 
behavior  is  likely  to  shift  (e.g.,  Kelso,  2005,  Kloos  &  Van  Orden,  2012).  This  is  likely  to  be 
reflected  in  the  properties  of  dynamic  patterns;  resulting  in  different,  but  stable  patterns  of 
variability  (e.g.,  P  &  Cross  Recurrence  values  change). 

The  current  study  measured  eye  gaze  in  a  visual  task  with  a  cognitive  component. 
Specifically,  the  task  was  a  visual  puzzle  task  in  which  participants  were  asked  to  unscramble  an 
image.  Given  the  nature  of  the  task,  eye  gaze  is  considered  a  primary  measure  of  performance. 
This  type  of  task  provided  a  way  to  manipulate  task  demands  by  changing  the  constraints  of  task 
difficulty,  practice,  and  the  addition  of  a  secondary  task.  Task  difficulty  was  manipulated  by 
changing  the  way  in  which  the  image  can  be  scrambled;  in  one  condition  puzzle  pieces  had  the 
potential  for  rotation.  This  manipulation  provided  a  way  to  control  for  any  potential  difficulty 
effects  of  any  individual  image,  while  still  manipulating  task  difficulty  (i.e.  the  information 
content  of  each  piece  of  the  puzzle)  in  a  significant  way.  Multiple  trials  of  the  same  difficulty 
level  allowed  for  potential  changes  in  dynamic  structure  due  to  learning  or  strategy  (i.e.,  practice 
effects)  to  be  observed.  Finally,  aside  from  general  task  difficulty,  a  secondary  task  was 
implemented  to  further  tax  participants’  attention  and  capacity. 

As  a  first  step  in  using  eye  gaze  for  state  assessment,  the  current  project  tested  discrete 
levels  of  task  difficulty  (as  opposed  to  a  continuous  increase  in  difficulty),  as  a  way  to  determine 
if  differences  in  eye  gaze  exist  that  could  be  representative  of  a  ‘pre’  and  ‘post’  red  line  situation. 
Rather  than  stipulate  explicit  directional  hypotheses,  the  current  questions  are  explicitly  two 
tailed.  It  is  difficult  to  specify  a  direction  of  the  changes  in  dynamic  structure  at  the  outset  of  this 
project.  Changes  in  task  demands  could  create  disruptions  (i.e.  critical  fluctuations  add  noise  to 
the  system)  and  as  a  result  randomness  (e.g.,  a  ‘whitening’  of  the  time  series)  could  be  observed. 
Alternatively,  changes  in  task  demands  could  further  constrain  the  possibilities  for  action;  this 
would  result  in  higher  levels  of  dynamic  structure  in  eye  movements  (i.e.  critical  fluctuations; 
system  becomes  more  periodic).  Either  direction  provides  insight  into  underlying  processes,  and 
potential  classification  of  the  operator. 

Practice  effects  may  also  further  influence  dynamic  patterns  observed,  however  it  is  also 
difficult  to  specify  a  specific  direction  of  change  in  dynamic  structure.  A  serial  or  other  highly 
structured  scan  path  could  be  implemented  early  in  learning,  and  with  learning  participants  could 
shift  to  a  less  constrained  scan  path.  Alternatively,  scan  paths  could  initially  exhibit  more 
randomness,  and  show  an  increase  in  structure.  Again,  either  direction  could  provide  insight  into 
the  underlying  state  of  the  operator. 
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3.2 


Methods 


3.2.1  Participants 

Thirty-two  total  participants  with  ages  ranging  from  18-30  years  from  a  Midwestern 
university  population  were  reeruited  to  participate  and  were  eompensated  with  course  credit  or 
were  paid  $30.  One  participant  was  dropped  due  to  a  calibration  error  with  the  eye  traeking 
equipment.  Thirty-one  total  partieipants  are  included  in  the  subsequent  analysis.  Biographic 
information  was  collected  via  self-report  questionnaire.  There  were  14  male  and  17  female 
partieipants  with  a  median  age  of  23.  All  reported  normal  or  correeted  to  normal  vision.  Highest 
edueation  level  eompleted  was  as  follows:  High  Sehool  (15),  assoeiate’s  degree  (3),  baehelor’s 
degree  (7),  and  graduate  degree  (6).  Experience  with  video  games  was  assessed,  with  a  range  of 
0  to  16  hours  per  week  reported,  with  an  average  of  3. 16  (SD  =  3.2)  hours  of  video  game  play 
per  week. 

3.2.2  Materials  &  Apparatus 

Eye  gaze  was  measured  via  a  Eacelabd  “off  the  head”  eye  traeker,  hosted  on  a  Dell 
Eatitude  D830  laptop  computer  (2.2  GHz  processor,  2  GB  RAM).  This  eombination  allowed  for 
+/-  1  degree  of  visual  angle  eye  traeking  capability  at  a  collection  rate  of  60Hz.  Eaeelab  API 
v4.6  (referenee)  was  integrated  with  eustom  software  written  to  display  images  for  this 
experiment.  The  output  of  the  traeking  software  was  the  X  and  Y  pixel  location  of  participants’ 
gaze  every  16.7  ms.  The  participant  station  was  an  HP  Compaq  DC80  desktop  eomputer  (2.3 
GHz  proeessor,  3.5  GB  RAM)  &  a  ECD  monitor  (Samsung  940BX)  with  a  screen  area  of  30cm 
by  37.5cm  (48cm  diagonal),  and  a  resolution  of  1280  x  1024  pixels. 

Images  were  sized  at  1020  x  1020  pixels,  which  at  a  viewing  distanee  of  approximately 
60cm,  is  approximately  27  degrees  of  visual  angle.  When  subdivided  into  36  equal  sized  square 
pieces  for  the  puzzle  each  piece  was  170  pixels  square.  At  a  60em  viewing  distance,  eaeh  puzzle 
piece  subtended  approximately  4.5  degrees  of  visual  angle. 

3.2.3  Image  Selection 

Initial  images  were  selected  from  public  domain  sources  (e.g.,  Wikipedia).  Images 
eontaining  human  faces  were  excluded.  In  addition,  all  images  were  seleeted  to  contain  a 
“natural”  correct  orientation.  Early  pilot  testing  of  “non-oriented”  still  life  images  suggested  that 
a  participant  in  the  rotated  condition  could  solve  the  puzzle  sueh  that  the  pieees  appeared  to  be 
correctly  matching  yet  the  entire  puzzle  was  rotated  (i.e.  the  puzzle  was  put  together  in  a  way 
that  all  the  pieces  ‘matched’,  but  were  all  upside  down).  Twelve  images  meeting  these  criteria 
were  initially  selected. 

In  order  to  select  the  five  images  needed  for  Experiment  1,  the  12  images  were  pilot 
tested  by  4  participants  meeting  the  recruitment  requirements  described  above.  Participants 
unscrambled  all  12  images  in  a  randomized  order  for  the  standard  puzzle  condition  (see  below). 
Images  were  then  ranked  based  on  average  time  to  completion.  Time  series  analyses  require  a 
minimum  number  of  samples  for  a  valid  analysis,  therefore  the  five  images  that  had  the  longest 
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completion  times  were  chosen,  provided  they  were  solved  by  all  pilot  participants.  To  determine 
if  there  were  any  rank  differences  between  participants,  these  five  images  were  subjected  to  a 
nonparametric  Friedman  rank  order  test.  No  significant  differences  were  observed. 

To  minimize  order  effects  and  properties  of  a  specific  image  images  were 
counterbalanced  in  pairs  (see  below).  Figure  10  depicts  image  pair  1;  an  image  of  a  mountain 
lake  (left)  and  an  image  of  sunflowers  (right).  Figure  1 1  depicts  image  pair  2;  an  image  of  the 
skyline  of  the  city  of  Cleveland  (left)  and  an  image  of  an  antique  printing  press  (right).  Figure 
12  is  the  image  used  for  the  fifth  trial  (see  below)  which  is  an  image  of  trees  along  a  walkway. 


Figure  10.  Image  pair  1  (Mountain  Lake,  Left;  Sunflowers,  Right)  was  always  presented  in  trials  1  &  2  and 
was  counterbalanced  such  that  across  participants  both  images  were  seen  in  standard  and  complex 
configurations  and  in  different  presentation  orders. 


Figure  11.  Image  pair  2  (Cleveland  skyline,  Left;  Antique  Printing  Press,  Right)  was  always 
presented  in  trials  3  &  4  and  was  counterbalanced  such  that  across  participants  both  images 
were  seen  in  standard  and  complex  configurations  and  in  different  presentation  orders. 
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Figure  12.  The  image  used  for  trial  5  was  preseuted  with  a  hetweeu  subjects  mauipulatiou  of  puzzle 
type.  All  participauts  iu  the  respective  couditious  saw  the  same  staudard  &  complex  puzzle 
coufiguratious. 


3,2,4  Procedure  &  Design 

Participants  received  computer-based  training  about  task  procedures  and  how  to 
manipulate  puzzle  pieces.  Participants  were  then  given  two  5x5  training  puzzles  to  familiarize 
themselves  with  the  task.  The  first  puzzle  appeared  with  non-rotated  pieces  and  the  second 
puzzle  included  rotated  pieces  (see  description  of  rotation  below).  Participants  had  an  unlimited 
time  to  complete  the  training  puzzles  and  could  ask  questions  at  any  time. 

Between  trials,  participants  were  then  shown  a  black  target  dot  on  an  otherwise  white 
screen.  Participants  were  asked  to  fixate  on  the  dot  and  after  doing  so,  initiate  the  task  by  left 
clicking  the  mouse.  The  intact  image  was  then  displayed  for  5  seconds.  Then  the  image  was  split 
into  36  (6  X  6  grid)  equal  sized  squares.  These  squares  were  scrambled  randomly  such  that  all 
pieces  changed  position.  The  participants’  task  was  to  rearrange  the  squares  back  into  the 
original  image,  within  a  15  minute  time  limit.  Once  an  image  was  completed  (or  timed  out  at  15 
minutes)  the  fixation  screen  came  up  and  participants  proceeded  to  the  next  trial  at  their  own 
pace. 


The  difficulty  manipulation  was  implemented  by  changing  the  attributes  of  puzzle  pieces 
that  were  needed  to  solve  the  puzzle  correctly.  In  the  standard  condition,  images  were 
scrambled  by  x-y  location  only.  In  the  complex  condition,  image  pieces  could  be  rotated  in 
addition  to  the  x-y  location  manipulation.  Rotation  was  in  90  degree  intervals,  leaving  4 
potential  orientations  (0,  90,  180,  270  degrees  from  horizontal).  Each  orientation  was  fixed  to 
25%  of  pieces  (9  pieces  per  orientation),  but  the  selection  of  pieces  was  random  across 
participants.  This  ensured  that  all  participants  had  the  same  level  of  rotation,  with  random 
variation  in  the  exact  puzzles  seen. 
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Images  were  eounterbalaneed  in  pairs  in  whieh  the  first  two  trials  had  the  same  two 
images  and  the  last  two  trials  used  the  same  images.  Images  were  eounterbalaneed  such  that 
each  image  was  seen  in  both  standard  and  complex  versions  across  participants.  In  all  cases 
participants  used  the  mouse  to  interact  with  the  image,  with  a  left  click  for  location  manipulation 
and  a  right  click  for  rotation  manipulation  (when  implemented). 


Figure  13.  A  diagram  of  the  first  four  experimeutal  trials,  iu  oue  of  two  couoterbalauced  coufiguratioos. 
Specific  comparisous  are  auuotated.  The  desigu  allows  for  multiple  comparisous  of  task  demauds,  as  well  as 
practice  effects. 

An  overview  of  the  experimental  procedure  for  one  counterbalanced  configuration,  with 
descriptions  of  the  task  parameters  is  presented  in 

Table  1.  A  subset  for  trials  1  through  4  is  diagrammed  in  Figure  13.  The  design  was  a 
mixed  design,  with  a  within  subjects  manipulation  of  task  demands.  The  first  four  trials  were 
counterbalanced  in  an  A-B-B-A  /  B-A-A-B  blocked  design  across  participants.  Each  A-B  block 
was  further  counterbalanced  across  two  images.  This  facilitated  both  a  task  demand  comparison 
(standard  to  complex;  trials  1  to  2  and  3  to  4)  as  well  as  multiple  tests  of  practice  in  trials  1  &  4, 
as  well  as  a  repeated  difficulty  comparison  in  trials  2  &  3. 


Table  1.  Au  example  of  the  experimeutal  implemeutatiou  for  the  first  couuterbalauce  type  iu 
Experimeut  1. 


Trial  Number 

Puzzle  Type 

(A-B-B-A  (+1 )  counterbalance) 

Task  Description 

Instructions  &  Training  (unlimited 
time  to  complete  training  puzzles) 

Sample  Standard  &  Complex 
Image 

5x5  Randomized 

Trial  1  (15  minute  time  limit) 

Standard  Puzzle,  Image  Pair  1 

6x6  Randomized,  x-y  position 
change 

Trial  2  (15  minute  time  limit) 

Complex  Puzzle  Image  Pair  1 

6x6  Randomized,  x-y  position 
change  +  rotated  pieces 

Trial  3  (15  minute  time  limit) 

Complex  Puzzle  Image  Pair  2 

6x6  Randomized,  x-y  position 
change  +  rotated  pieces 

Trial  4  (15  minute  time  limit) 

Standard  Puzzle  Image  Pair  2 

6x6  Randomized,  x-y  position 
change 

20 

Distribution  A:  Approved  for  public  release;  distribution  unlimited. 
88  ABW  Cleared  09/08/2014;  88ABW-2014-4229. 


Trial  5  (15  minute  time  limit) 


Standard  or  Complex  Image 
(Between  Subjects) _ 


6x6  Fixed  Scramble  + 
Secondary  Audio  Task 


The  fifth  trial  consisted  of  a  between  subjects  manipulation  of  standard  or  complex 
puzzle,  with  the  addition  of  a  secondary  audio  task.  There  were  16  participants  in  the  standard 
puzzle  condition  and  15  participants  in  the  complex  puzzle  condition.  Unlike  the  previous 
randomized  puzzles,  the  specific  order  of  the  scramble  was  fixed  for  the  final  trial.  One  puzzle 
was  used  for  both  conditions  (fitting  with  randomization  parameters  described  above). 

The  secondary  audio  task  was  a  radio  monitoring  task,  in  which  participants  were 
required  to  listen  to  a  series  of  messages  containing  a  “call  sign”  and  a  specific  color/number 
code  (e.g..  Ready  Tiger  go  to  Red  7  Now).  Participants  responded  to  messages  containing  a 
specific  call  sign  by  pressing  the  space  bar  on  a  keyboard  to  activate  the  microphone  and 
repeating  the  entire  critical  message.  There  were  five  distracter  call  signs:  Arrow,  Charlie, 

Eagle,  Ringo,  &  Tiger.  The  critical  call  sign  was  Barron.  There  were  four  color  coordinates 
(Blue,  Red,  White,  and  Green)  and  seven  number  coordinates  (1  through  7),  creating  a  pool  of  28 
potential  critical  signals  among  140  possible  distracter  messages.  All  messages  were  2  seconds  in 
duration.  All  messages  were  male  speakers,  randomly  selected  from  a  pool  of  6  possible 
speakers  (recordings  were  available  for  all  168  possible  combinations  for  all  6  speakers). 

All  participants  received  the  same  message  order  which  was  randomized  according  to  the 
following  parameters.  Messages  were  presented  in  pairs  that  were  programmed  to  overlap  each 
other  by  1  second.  Beginning  at  10  seconds  from  the  start  of  the  trial,  message  pairs  occurred 
approximately  every  5-6  seconds  thereafter.  A  critical  message  was  programmed  to  occur  once 
for  every  30  second  time  period.  For  the  15  minute  trial,  half  of  the  critical  signals  were  “cut 
ins”  (the  signal  began  in  the  middle  of  a  distracter)  and  half  were  “interrupted”  (the  signal  was 
interrupted  by  a  distracter). 

3.2,5  Dependent  Variables 

Multiple  DV’s  will  be  explored  for  their  potential  utility  in  distinguishing  between  task 
difficulty  and  time  on  task  manipulations.  Table  2  summarizes  the  dependent  variable, 
description  of  calculation,  and  it’s  classification  of  “conventional”  or  “dynamic”  in  regards  to 
variability  over  time. 


Table  2.  Summary  of  dependent  variables  in  Experiment  1. 


Variable  Name 

Description 

Classification 

Average  Fixation  Time 

Average  length  of  all  fixations  in  a 
trial 

Conventional 

Fixations  per  Minute 

Number  of  fixations  divided  by  Trial 
Time 

Conventional 

P  Value 

Frequency  response  of  Scan  Path 

Dynamic 

Cross  Recurrence 
(Piece  vs.  Position) 

Percentage  of  Recurring  States 

Dynamic 

Cross  Determinism 
(Piece  vs.  Position) 

Percentage  of  Recurring  States  that 
Recur  in  an  order 

Dynamic 

Diagonal  Recurrence  (Piece  vs. 
Position) 

Percentage  of  recurring  states  that 
recur  at  the  same  point  in  time 

Dynamic 
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3.2,6  Calculation  of  Fixations 


Fixation  duration  and  location  was  determined  using  dispersion  based  teehniques  from 
Salvueei  and  Goldberg  (2000).  At  a  eolleetion  rate  of  60  Hz,  a  minimum  of  6  eonseeutively 
traeked  points  with  a  maximum  dispersion  of  1  degree  (for  all  6  points)  was  eonsidered  the 
minimum  eriterion  for  a  fixation.  The  ealeulated  eentroid  of  the  fixation  points  was  eonsidered 
the  loeation  of  the  fixation.  The  resulting  loeation  of  fixation  was  used  in  eonjunetion  with  the 
loeation  of  the  puzzle  pieees  to  ereate  a  time  series  of  whieh  pieees  were  fixated  upon,  and  whieh 
position  on  the  grid  that  pieee  was  in  (see  below).  This  method  also  yields  duration  for  eaeh 
fixation,  whieh  is  then  used  for  ealeulations  of  average  fixation  time. 

3,2.7  Quantification  of  Dynamic  Structure 

As  previously  mentioned,  dynamie  strueture  in  a  time  series  ean  be  assessed  using 
multiple  analytieal  tools.  The  present  analysis  will  utilize  two  different  mathematieal  teehniques 
to  analyze  dynamie  strueture  in  eye  gaze  time  series.  The  first  is  P  values  observed  from  angular 
ehange  time  series  as  used  by  Aks  et  al,  (2002)  and  Stephen  and  Anastas  (2011).  The  angular 
differenee  between  eaeh  measured  X-Y  position  was  eomputed  and  the  subsequent  “gaze  step” 
time  series  was  then  submitted  to  a  Fast  Fourier  Transform  variant  optimized  for  eharaeterizing 
the  noise  eategory  of  a  time  series  (Eke  et  al,  2002). 

Speeifieally,  the  Power  Speetral  Density  Low  (PSDiow)  method  (Eke  et  al,  2002)  was 
used  to  ealeulate  the  speetral  slope.  The  first  8192  angular  ehange  values  ealeulated  for  eaeh  trial 
were  normalized  to  a  mean  of  zero  and  a  standard  deviation  of  1 .  Normalized  values  were  then 
bridge  detrended  (a  line  eonneeting  the  first  point  and  the  endpoint  is  subtraeted  from  the  time 
series).  The  East  Eourier  Transform  (LET)  was  eondueted  on  7  data  windows  of  2048  data 
points.  Eour  of  these  windows  were  adjoining  and  therefore  unique  (i.e.  the  8192  points  are 
divided  into  four  adjoining  sets  of  2048  points),  three  windows  overlapped  the  ‘borders’  of  the 
sequential  windows.  The  LET  values  for  all  windows  were  then  averaged,  yielding  the  power 
speetral  density  profde  (e.g.,  relative  frequeney  to  absolute  power).  Einally  the  slope  was 
ealeulated  on  only  the  eenter  of  the  frequeney  ranges  (exeluding  the  lowest  1/8  and  highest  1/8  of 
the  frequeney  range);  this  eliminates  whitening  of  the  frequeney  response  often  seen  at  the 
lowest  and  highest  frequeneies  of  the  data  (Eke  et  al,  2002).  The  resulting  (loglO)  speetral 
density  plot  was  then  fit  with  a  standard  regression  in  whieh  the  slope  is  the  P  value. 

A  seeond  teehnique  was  used  to  evaluate  dynamie  strueture  in  the  order  alignment  of 
pieee  and  position  fixations.  As  previously  mentioned.  Cross  Reeurrenee  Quantifieation 
Analysis  (CRQA)  provides  multiple  dependent  variables  whieh  quantify  the  level  and  types  of 
dynamie  strueture  seen  between  two  time  series  (Webber  and  Zilbut,  2005;  Dale  et  al,  201 1). 

This  type  of  analysis  was  instantiated  for  nominal  or  eategorieal  time  series  in  aeeordanee  with 
praetiees  from  Riehardson  &  Dale  (2005).  In  the  present  analysis,  two  eategorieal  time  series  of 
fixations  were  generated:  A  time  series  of  the  positions  of  the  board  and  a  time  series  of  the 
pieces  of  the  puzzle  that  were  the  foeus  of  the  fixation.  Eaeh  time  series  was  windowed  in 
inerements  of  400  fixations;  for  CREC  and  CDET  the  average  values  aeross  windows  were  used 
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for  subsequent  inferential  analysis.  Subsequent  to  the  initial  CRQA  analysis,  diagonal 
recurrenee  profiles  were  ealculated  aeross  the  entire  time  series  in  aeeordanee  with  Riehardson 
and  Dale  (2005)  to  determine  the  eoupling  observed  between  position  and  pieee  of  fixation. 

In  order  to  determine  whether  or  not  any  dynamie  strueture  observed  is  a  produet  of 
chanee,  all  dynamic  structure  analyses  were  subjected  to  surrogation  tests.  Time  series  were 
randomly  shuffled  and  re-analyzed.  In  the  surrogated  analyses,  any  significant  temporal 
structure  present  in  the  original  time  series  should  be  lost,  e.g.,  P  values  should  approach  zero, 
%CREC  &  %CDET  should  approach  chance  levels.  In  all  cases  for  all  dynamic  variables,  the 
surrogated  measures’  values  were  statistically  different  from  measures  calculated  from  the 
original  time  series,  as  measured  by  paired  samples  Mests  (p  >.  05). 

3,3  Results 

3,3.1  Results  for  Trials  1  through  4 

For  trials  1  to  4,  all  dependent  variables  were  subjected  to  a  2  x  2  x  2  mixed  ANOVA 
with  2  levels  of  task  demands  (within  subjects  factor  of  standard  or  complex  puzzle),  2  levels  of 
practice  (within  subjects  factor  of  first  presentation  or  second  presentation)  and  2  levels  of 
counterbalance  (between  subjects  presentation  order  of  Standard-Complex-Complex-Standard 
(SCCS)  or  Complex-Standard-Standard-Complex  (CSSC)).  Aside  from  completion  time,  which 
had  a  directional  expectation,  the  statistical  tests  for  Experiment  1  were  explicitly  two  tailed. 

Completion  time  had  a  significant  main  effect  of  task  demands  such  that  complex  puzzles 
took  longer  to  complete  than  standard  puzzles  as  shown  in 

Table  3.  There  was  no  indication  of  a  performance  difference  with  practice  (i.e.  no 
difference  between  presentations  1  &  2),  nor  were  any  other  main  effects  or  interactions 
significant  for  completion  time.  The  differences  in  completion  time  were  also  reflected  in  the 
ability  of  participants  to  solve  the  puzzles  in  the  allotted  time.  For  standard  puzzles,  57  of  62 
puzzles  were  successfully  solved  (92%),  with  5  of  62  (8%)  puzzles  unsolved.  For  complex 
puzzles  29  of  62  puzzles  (47%)  were  successfully  solved,  and  33  of  62  puzzles  (53%)  unsolved. 
Separate  2x4  chi  squared  analyses  (one  for  each  difficulty)  were  performed  to  address  any 
potential  differences  in  solve  rates  between  the  four  images  used.  In  both  cases  there  were  no 
significant  differences  in  solve  rates  between  images:  Standard  puzzles  x  (3)  =  .58,  p  >  .05.; 
Complex  puzzles  x^(3)  =  6.94,  p  >  .05.  Taken  together,  these  results  confirm  the  expectation  that 
complex  puzzles  were  more  difficult  when  compared  to  standard  puzzles,  and  that  difficulty 
differences  were  driven  by  the  puzzle  type  manipulation  and  not  aspects  any  individual  image. 

Conventional  gaze  metrics  included  in  the  present  analysis  were  fixations  per  minute  and 
average  fixation  length.  There  was  a  main  effect  of  task  demands  for  fixations  per  minute,  as 
shown  in 

Table  3.  The  number  of  fixations  per  minute  was  lower  for  complex  puzzles  than  for 
standard  puzzles.  Average  fixation  length  exhibited  a  significant  main  effect  of  task  demands,  as 
shown  in 
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Table  3.  The  length  of  the  average  fixation  in  a  eomplex  puzzle  was  longer  than  the 
average  fixation  for  a  standard  puzzle.  Figure  14  shows  an  unexpeeted  signifieant  two  way 
interaetion  between  eounterbalanee  and  practice  for  average  fixation  length  F  (1,  29)  =  9.924  p  < 
.05.  When  standard  puzzles  were  presented  on  trials  1  &  4  (SCCS  counterbalance)  average 
fixation  decreased  for  the  second  presentation  while  the  inverse  was  true  when  complex  puzzles 
were  presented  on  trials  1  &  4. 


Table  3.  Summary  of  significant  main  effects  of  Task  Demands  for  trials  1-4. 


DV 

Standard 

Mean (SD) 

Complex 

Mean (SD) 

F  values 

Completion  Time 
(minutes) 

8.68  (3.15) 

12.9  (2.8) 

F(1,29)=  115.22  p<  .05 

Fixations  per  Minute 
(count) 

235  (18.3) 

229  (16.8) 

F(1,29)  =  7.57  p<.05 

Average  Fixation 

Length 

(milliseconds) 

184.0  (10.99) 

198.53  (13.95) 

F(1,29)  =  81.58  p<.05 

Percent  Cross 
Determinism 

55.1  (.05) 

57.7  (.05) 

F(1,29)=  15.78  p<  .05 
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Figure  14.  Average  Fixation  Length  (Y-Axis)  by  Presentation  (X-Axis)  for  two  Counterbalanced  Orders 
(dashed  vs.  solid  lines).  When  collapsed  across  Task  Demands,  Presentation!  and  2  show  divergent 
relationships  depending  on  the  counterbalanced  order.  Error  bars  represent  +/- 1  standard  error. 

Non-conventional  metrics  of  dynamic  structure  were  explored  with  the  expectation  that 
dynamic  structure  (reflecting  underlying  organization  of  cognitive  &  motor  systems)  would 
change  as  a  function  of  task  demands  and/or  practice.  The  first  test  of  this  expectation  was  for  P 
values.  There  were  no  significant  main  effects  for  P  values  for  task  demands  or  practice. 
However  there  was  an  unexpected  three  way  interaction  of  Task  Demands  x  Practice  x 
Counterbalance  for  P  values:  F  (1,  29)  =  4.66,  p  <.  05.  As  shown  in  Figure  15,  P  values  for 
complex  puzzles  do  not  differ  with  practice  (Figure  15a),  while  P  values  for  standard  puzzles 
(Figure  15b)  either  do  not  change  (separated  presentations;  e.g.,  trials  1  and  4)  or  increase  (if 
presented  back  to  back;  e.g.,  trials  2  and  3). 
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Figure  15.  a.)  p  values  (Y-Axis)  by  presentation  (X-Axis)  for  Complex  puzzles  in  two  Counterbalanced  Orders 
(dotted  vs.  solid  lines),  p  values  did  not  change  across  presentations  or  differ  based  on  tbe  order  of  tbe 
counterbalance.  Error  bars  represent  +/- 1  standard  error,  b.)  p  values  (Y-Axis)  by  Presentation  (X-Axis)  for 
Standard  puzzles  in  two  Counterbalanced  Orders  (dasbed  vs.  solid  lines).  When  tbe  two  presentations  were 
separated  (solid  line)  p  values  were  unchanged,  however  when  the  two  presentation  occurred  back  to  back  p 
values  increase  from  Presentation  1  to  Presentation  2.  Error  bars  represent  +/- 1  standard  error. 


In  addition  to  frequency-based  measures,  metrics  of  dynamic  structure  derived  from  a 
cross  recurrence  matrix  of  piece  and  position  of  fixation  were  tested.  For  the  most  basic  of  these, 
cross  recurrence,  there  were  no  significant  main  effects  or  interactions.  However,  cross 
determinism  had  a  significant  main  effect  of  task  demands  ( 

Table  3)  and  practice  (Table  4).  Cross  determinism  increases  by  around  2%  for  both 
Complex  Puzzles  (vs.  Standard)  and  the  Second  Presentation  (vs.  First). 

There  was  a  significant  effect  of  practice  for  diagonal  recurrence  as  shown  in  Table  4. 
Diagonal  recurrence  increases  by  around  4%  from  the  first  to  the  second  presentation.  In  this 
context,  diagonal  recurrence  represents  an  increase  in  fixations  upon  pieces  that  are  in  the  correct 
positions.  Note  that  this  explicit  relationship  between  piece  and  position  is  due  to  the 
measurement  of  diagonal  recurrence  at  zero  lag. 


Table  4.  Summary  of  significant  main  effects  of  Practice  for  trials  1-4. 


DV 

First  Presentation 

Second  Presentation 

F  values 

Mean (SD) 

Mean (SD) 

Diagonal  Recurrence 
Profile 

18.9  (10.4) 

23.2  (11.9) 

F(1,29)  =  4.66p<  .05 

Percent  Cross 
Determinism 

55.6  (.05) 

57.2  (.04) 

F(1,29)  =  5.99  p<.05 
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3,3,2  Results  for  Trial  5 


For  the  inferential  analysis  of  the  final  trial,  which  included  a  secondary  audio  task,  the 
dependent  variables  were  subjected  to  a  one-way  between-subjects  ANOVA  for  Task  Demands 
(Standard  vs.  Complex).  The  significant  results  can  be  seen  in  Table  5.  The  general  expectation 
for  Trial  5  was  that  secondary  task  performance  would  not  change,  but  the  addition  of  a 
secondary  task  could  alter  puzzle  performance  and/or  gaze  behavior  by  reducing  spare  capacity 
of  the  participants. 

For  the  primary  task  of  solving  the  puzzle,  there  was  a  main  effect  of  Completion  Time, 
as  shown  in  Table  5.  As  expected,  the  Complex  puzzle  took  longer  to  complete  than  the 
Standard  puzzle.  This  was  consistent  with  the  results  for  trials  1-4. 

The  secondary  audio  task  was  scored  for  accuracy  of  responses  to  critical  signals.  The 
values  were  percentages,  since  the  number  of  critical  signals  heard  by  the  participant  was 
determined  by  their  performance  time.  As  expected,  there  were  no  significant  differences  in  the 
percentage  of  correct  signals  between  levels  of  Task  Demands.  The  mean  percentage  for 
Standard  puzzles  was  85.6%  correct  with  a  standard  deviation  of  27%.  For  Complex  puzzles  the 
mean  was  82.5%  correct  with  a  standard  deviation  of  29.9%. 

Average  Fixation  Length  had  a  significant  relationship  with  Task  Demands,  with 
Complex  puzzles  exhibiting  an  average  length  approximately  14  ms  longer  than  Standard 
puzzles.  This  was  the  same  direction  as  was  seen  in  trials  1-4. 

P  values  did  not  differ  for  different  Task  Demands.  The  average  P  for  Standard  puzzles 
was  -1.29  (SD  =  .13)  and  was  -1.31  (SD  =  .13)  for  Complex  puzzles.  This  did  not  support  the 
expectation  that  p  values  would  be  sensitive  to  changes  in  task  demands. 

Recurrence-based  metrics  show  a  significant  increase  in  Percent  Cross  Recurrence  as 
well  as  Percent  Cross  Determinism.  Cross  Recurrence  was  1.1%  higher  for  Complex  Puzzles  as 
compared  to  standard  puzzles,  and  Cross  Determinism  was  8%  higher  for  Complex  Puzzles. 
Diagonal  Recurrence  was  not  different  across  Task  Demands.  These  results  support  the 
expectation  of  a  change  in  dynamic  structure  under  different  Task  Demands.  Cross  Recurrence 
and  Cross  Determinism  both  indicate  increasing  structure  with  higher  task  demands,  similar  to 
what  was  observed  for  trials  1-4. 


Table  5.  Summary  of  significant  results  for  Trial  5. 


DV 

Standard 

Complex 

F  values 

Mean (SD) 

Mean (SD) 

Completion  Time 
(minutes) 

6.89  (2.88) 

11.19  (2.89) 

F(1,29)=  1 15.22  p<  .05 

Average  Fixation 
Length 

(milliseconds) 

192.69  (12.05) 

206.66  (10.77) 

F(1,29)=  11.52  p  <  .05 
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Percent  Cross 
Recurrence 

4.2  (.64) 

5.3  (1.2) 

F(1,29)  =  9.6  p<.05 

Percent  Cross 
Determinism 

55.16  (6.3) 

63.0  (3.7) 

F(1,29)=  17.35  p<  .05 

In  order  to  determine  the  impaet  of  the  seeondary  audio  task  eompletion  time,  an  analysis 
was  eonducted  whieh  eompared  eompletion  time  for  Trial  5  to  the  seeond  presentation  (i.e.,  Trial 
3  or  4)  of  the  eorresponding  diffieulty  eondition  to  that  presented  in  Trial  5.  The  main  effeets  of 
this  analysis  are  presented  in  Table  6.  Overall,  the  seeondary  task  shows  very  little  impaet;  there 
was  no  difference  in  completion  time  between  the  paired  trials.  The  only  significant  differences 
point  to  effects  of  Practice,  similar  to  what  was  observed  for  trials  1-4. 


Table  6.  Summary  of  significant  main  effects  for  paired  difficulty  comparisons  with  and  without  the 

secondary  task. 


DV 

Presentation  2 

Trial  5 

F  values 

Mean (SD) 

Mean (SD) 

Diagonal  Recurrence 
(percent) 

22.04  (12.5) 

35.09  (13.07) 

F(1,27)=  15.976  p<  .05 

Average  Fixation 

Length 

(milliseconds) 

191.6  (11.77) 

199.4  (13.31) 

F(1 ,27)  =  39.714  p  <  .05 

3,4  Discussion 

At  the  outset  of  Experiment  1,  it  was  hypothesized  that  the  manipulation  of  Task 
Demands  would  cause  a  change  in  Completion  Time;  the  primary  question  was  if  eye  gaze 
measures  would  be  sensitive  to  the  changes,  and  furthermore  if  a  distinction  occurred  between 
the  types  of  eye  gaze  measures  (conventional  and  dynamic).  This  question  was  also  presented  in 
regards  to  Time  on  Task,  as  well  as  spare  capacity  (Trial  5).  The  manipulation  of  Task  Demands 
had  the  expected  effect  on  Completion  Time,  which  was  an  important  manipulation  check.  The 
findings  of  Experiment  1  supported  the  expectation  that  eye  gaze  would  reflect  differences  in 
Completion  Time. 

Previous  work  suggested  that  the  addition  of  secondary  task  might  change  gaze  behavior 
(e.g..  May  et  al,  1990),  however  the  data  from  Trial  5  seems  to  suggest  that  there  were  no 
significant  impacts  of  spare  capacity  on  gaze  behavior.  When  the  eye  gaze  measures  from  Trial 
5  were  compared  to  the  corresponding  puzzle  type  from  the  second  presentation  (i.e.  Trial  3  or 
Trial  4  depending  on  the  counterbalance)  the  trends  observed  in  trials  1-4  are  unchanged  when 
participants  completed  a  radio  monitoring  task  while  completing  the  puzzle.  This  may  be  due  to 
different  resources  required  for  both  tasks,  (i.e.,  visual  vs.  auditory;  Wickens,  2002).  This  would 
create  a  situation  in  which  the  two  types  of  tasks  used  here  would  be  least  likely  to  impact  one 
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another.  However,  two  different  task  types  were  required  so  that  the  visual  display  would  be 
unehanged  with  the  addition  of  the  seeondary  task. 

Generally,  measures  of  eye  gaze  were  sensitive  to  the  different  puzzle  types.  However 
there  was  no  elear  distinetion  between  eonventional  and  dynamie  measures  of  gaze;  measures  of 
averaged  fixation  aetivity  and  reeurrence  measures  both  showed  signifieant  effeets  of  Task 
Demands.  Average  Fixation  Length  (with  a  corresponding  decrease  in  Fixations  per  Minute)  and 
Cross  Determinism  were  both  higher  in  Complex  puzzles.  In  the  present  context.  Cross 
Determinism  represents  a  relationship  between  piece  and  position  of  fixation  that  is  consistent  in 
time,  although  not  necessarily  the  correct  piece/position  placement.  Taken  together,  there  was  a 
tendency  to  fixate  for  longer  periods  of  time  (and  a  fewer  number  of  times)  in  a  more  structured 
sequence  in  Complex  Puzzles.  Longer  fixations  are  likely  due  to  the  time  it  takes  to  orient  pieces 
when  rotated.  Deterministic  sequences  of  fixations  suggest  there  is  an  increase  in  repeated 
fixations  for  pieces  in  the  same  piece/position  configuration  for  complex  puzzles.  This  likely 
reflects  looking  from  one  piece  to  another  and  then  back  in  order  to  determine  where/if  the  piece 
should  be  moved. 

In  regards  to  practice  or  learning  effects,  it  was  expected  that  learning  or  strategy  shifts 
could  be  seen  in  Completion  Time  and  also  reflected  in  gaze  patterns  at  different  presentations. 
While  there  were  no  changes  in  Completion  Time,  there  were  main  effects  of  Practice  for  the 
recurrence-based  metrics  of  Percent  Cross  Determinism  and  Diagonal  Recurrence.  In  this  case, 
it’s  likely  that  the  increase  in  Percent  Determinism  is  directly  related  to  the  increase  in  Diagonal 
Recurrence;  Determinism  quantifies  all  sequential  fixations,  and  Diagonal  Recurrence  quantifies 
a  subset  of  those  sequential  fixations,  specifically  those  in  which  piece  and  position  are  exact 
matches  in  time.  As  previously  mentioned,  the  increase  in  Diagonal  Recurrence  suggests  that 
participants  are  learning  about  the  task;  they  are  increasing  the  number  of  fixations  on  pieces  in 
the  correct  positions.  In  terms  of  looking  at  the  images,  it  could  be  the  case  that  participants 
were  using  pieces  that  had  been  correctly  placed  as  references  or  anchors  from  which  to  select 
and  place  other  pieces.  However,  there  was  no  effect  of  Completion  Time  for  Practice,  so  this 
change  in  gaze  patterns  did  not  result  in  a  faster  performance  outcome. 

While  only  dynamic  measures  showed  significant  main  effects.  Average  Fixation  Length 
had  an  interaction  with  Time  on  Task  and  Counterbalance,  suggesting  that  the  order  of  the  puzzle 
presentations  had  an  effect  on  the  length  of  fixation.  Specifically,  the  two  counterbalance  types 
show  a  divergent  relationship.  Participants  in  the  SCCS  counterbalance  show  an  increase  in 
fixation  lengths  from  the  first  to  the  second  presentation,  whereas  those  in  the  CSSC 
counterbalance  shows  decreasing  fixation  lengths  on  the  second  presentation.  It  would  only  be 
speculative  to  interpret  this  finding,  other  than  to  interpret  some  form  of  transfer  in  gaze  strategy 
that  is  different  between  the  two  presentation  orders. 

It  was  expected  that  the  frequency  patterns  in  the  scan  path,  as  measured  by  P  values, 
would  be  classified  as  Hf  patterns,  as  has  been  reported  in  previous  work  (Aks  et  al,  2002  and 
Stephen  &  Anastas,  2011),  and  this  was  the  case.  It  was  further  expected  that  P  values  would  be 
sensitive  to  changes  in  task  demands,  based  partially  on  the  results  from  Stephen  and  Anastas 
(2011)  which  link  increases  in  P  values  to  faster  reaction  times.  However,  P  values  did  not 
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change  with  Task  Demands,  or  at  least  not  in  a  straightforward  manner.  Rather  than  respond  to 
Task  Demands  alone,  P  values  for  Experiment  1  suggest  some  type  of  transfer  of  gaze  patterns 
between  the  two  presentations  that  is  dependent  on  whieh  type  of  puzzle  was  seen  first.  At  this 
point  there  is  not  an  explanation  for  this  pattern  and  it  would  be  extremely  speeulative  to 
interpret  further. 

4,0  EXPERIMENT  2 

4.1  Introduction 

Overall,  the  results  from  Experiment  1  provide  mixed  answers  for  the  researeh  questions 
of  interest  at  the  outset.  On  one  hand,  eye  gaze  metrics  were  sensitive  to  the  manipulation  of  task 
demands,  a  demonstration  of  the  link  between  gaze  behavior  and  performanee  outeomes.  On  the 
other  hand,  this  was  the  ease  for  both  types  of  eye  gaze  metries  (eonventional  and  dynamie). 
Expanding  the  view  to  the  Praetiee  measures,  there  is  an  indieation  that  the  dynamie  measures 
may  be  sensitive  to  a  shift  in  gaze  strategy  in  ways  that  eonventional  measures  of  eye  gaze  are 
not,  but  this  distinetion  should  be  given  further  study  sinee  there  was  an  interaetion  with 
eounterbalanee  type.  It  was  unexpeeted  that  the  eounterbalanee  type  would  show  signifieanee  in 
the  inferential  tests;  the  eounterbalaneing  of  an  experimental  design  is  undertaken  to  nullify 
interaetions  between  manipulations.  The  interaetions  suggest  that  the  ehanges  over  time  that 
may  be  due  to  praetiee  or  learning  may  have  been  interrupted  by  the  manipulation  of  task 
demands  in  some  way  that  is  unelear  at  this  time. 

In  an  attempt  to  better  understand  ehanges  of  gaze  strategy  with  Praetiee,  a  short  (e.g., 
pilot),  follow-up  experiment  was  eondueted  whieh  did  not  include  manipulations  of  Task 
Demands.  Experiment  2  was  a  test  of  repeated  presentations  of  the  same  image  and  puzzle  type. 
The  expectation  was  that  Completion  Time  would  improve  with  repeated  presentations  of  the 
same  puzzle/image  eombination.  The  goal  of  the  Experiment  2  was  to  initiate  systematic 
learning  improvements  in  partieipants’  eompletion  times,  and  to  determine  the  degree  to  whieh 
these  ehanges  are  refleeted  in  different  measures  of  eye  gaze  (e.g..  Average  Eixation  Eength,  p 
values,  and  Diagonal  Recurrenee). 

4.2  Methods 

4.2.1  Participants 

All  participants  in  Experiment  2  had  successfully  completed  Experiment  1  (see  above  for 
requirements).  Although  6  participants  were  initially  tested,  one  participant’s  data  was  excluded 
due  to  a  calibration  error,  resulting  in  data  from  5  participants  being  included  in  the  analysis  for 
Experiment  2. 

4.2.2  Image  Selection 

Two  images  were  selected  for  Experiment  2;  both  images  had  previously  been  included 
in  either  the  image  selection  process  or  in  data  collection  for  Experiment  1 .  One  image,  a  sport 
utility  vehicle  (Eigure  16,  left)  was  used  from  the  pilot  image  selection  process  in  Experiment  1. 
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Another  image  (Figure  16,  right)  was  re-used  from  Experiment  1,  the  image  of  sunflowers. 
Images  were  randomly  assigned  to  participants. 


Figure  16.  The  two  images  used  between  subjects  in  Experiment  2.  The  Vehicle  (left  image)  was  used  in  the  pilot  testing 
of  Experiment  1;  the  Sunflowers  (right  image)  image  was  used  for  data  collection  in  Experiment  1. 


4.2.3  Apparatus 

Workstation  and  eye  tracking  apparatus  were  the  same  as  those  used  for  Experiment  1 . 
Eye  tracking  was  conducted  via  a  Eacelabd  “off  the  head”  eye  tracker,  hosted  on  a  Dell  Eatitude 
D830  laptop  computer  (2.2  GHz  processor,  2  GB  RAM).  This  combination  allows  for  +/-  1 
degree  of  visual  angle  eye  tracking  capability  at  a  collection  rate  of  60Hz.  Eacelab  API  version 
4.6  was  integrated  with  custom  software  written  to  display  images  for  this  experiment.  The 
output  of  the  tracking  software  was  the  X  and  Y  pixel  location  of  participants’  gaze. 

The  participant  station  was  an  HP  Compaq  DC80  desktop  computer  (2.3  GHz  processor, 
3.5  GB  RAM)  &  a  LCD  monitor  (Samsung  940BX)  with  a  height  of  30  cm  and  a  width  of 
37.5cm  (48cm  diagonal),  at  1280  x  1024  resolution. 

Images  were  sized  at  1020  x  1020  pixels,  and  at  a  viewing  distance  of  approximately 
60cm,  which  is  approximately  27  x  27  degrees  of  visual  angle.  When  subdivided  for  the  puzzle 
into  36  equal  sized  square  pieces  (170  pixels  width/height),  each  piece  was  approximately  4.75  x 
4.75  degrees  of  visual  angle. 

4.2.4  Procedure  and  Design 

Participants  were  given  verbal  instructions  about  the  task  procedures  and  how  to 
manipulate  the  puzzle  pieces,  hollowing  instructions,  participants  completed  one  5x5  practice 
image  to  familiarize  themselves  with  the  task.  Once  participants  solved  the  practice  image, 
participants  were  presented  a  series  of  9  trials  of  the  same  test  image  in  the  rotated  condition. 
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Rotation  was  in  90  degree  intervals,  leaving  4  potential  orientations  (0,  90,  180,  270 
degrees  from  horizontal).  Each  orientation  was  fixed  to  25%  of  pieces  (9  pieces  per  orientation). 
Between  trials,  participants  were  shown  a  black  target  dot  on  an  otherwise  white  screen. 
Participants  were  asked  to  fixate  on  the  dot  and  after  doing  so,  initiate  the  task  by  clicking  the 
left  mouse  button.  The  intact  image  was  then  displayed  for  5  seconds.  Then  the  image  was  split 
into  36  (6  X  6  grid)  equal  sized  squares.  The  puzzles  were  generated  in  a  randomized  way  such 
that  all  pieces  changed  position.  Each  trial  lasted  until  the  participant  completed  the  puzzle;  there 
were  no  time  limits  in  Experiment  2. 

4,3  Results 

In  Experiment  2  all  dependent  variables  were  subjected  to  a  one-way  ANOVA  for  trial  to 
explore  potential  relations  to  experience  or  learning.  It  was  expected  that  overall  performance 
would  improve  over  trials  (i.e..  Completion  Time  would  decrease).  A  primary  question  was  the 
degree  to  which  conventional  measures  of  eye  gaze  (Average  Eixation  Length  or  Eixations  per 
Minute)  and/or  alternative  measures  derived  from  dynamical  systems  theory  (P,  Cross 
Recurrence,  Cross  Determinism,  and  Diagonal  Recurrence)  would  provide  additional  insights 
into  the  performance  changes. 

As  shown  in  Eigure  17,  the  expectation  that  Completion  Time  would  decrease  was 
supported;  there  was  an  overall  effect  of  trial  on  Completion  Time,  as  reported  in  Table  7.  This 
change  was  in  the  expected  direction;  Average  Completion  Time  was  reduced  from  10.19  min  on 
Trial  1  to  2.87  min  on  Trial  9.  Completion  time  sharply  decreased  after  Trial  1,  asymptoting 
around  Trial  5. 

Eor  all  hypothesized  effects,  regression  models  were  fit  to  the  data  to  determine  the  type 
of  trend  observed.  Three  model  fits  were  chosen  based  on  research  in  the  domain  of  nonlinear 
dynamics  and  learning  (Crites  and  Gorman,  2013):  linear,  exponential,  and  power  law.  As  a 
first  step,  linear  should  be  tested  at  it  is  the  simplest  model  fit.  Both  exponential  and  power  were 
fit  in  order  to  discriminate  between  two;  different  types  or  categories  of  learning  (Crites  and 
Gorman,  2013).  Exponential  models  are  associated  with  learning  novel  skills,  while  power  law 
fits  are  associated  with  persistent  learning  (e.g.,  tuning  or  refining  existing  skills)  (Stratton  et  ah, 
2007).  The  R  squared  values  for  the  model  fits  are  summarized  in  Table  8.  The  trajectory  for 
Completion  Time  was  best  fit  by  a  power  law,  which  had  a  better  fit  than  the  exponential  & 
linear  models  (Table  8).  Taken  together,  there  is  strong  evidence  that  learning  was  taking  place 
with  repeated  exposure  to  puzzles  of  the  same  image. 
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Figure  17.  Average  Completion  Time  (Y-Axis)  by  Trial  (X-Axis).  Performance  time  decreased  with  repeated 
presentations  of  the  same  puzzle.  Error  bars  represent  +/- 1  standard  error.  Three  model  fits  were  tested: 
linear  (grey  line),  exponential  (blue  line),  and  power  (red  line). 


Table  7.  Summary  of  dependent  variables  tested  in  Experiment  2. 


DV 

Description 

F  value 

Completion  Time 

Average  time  to  solve  puzzle 

F(1,4)  =  13.59,  p  <.05 

Average  Fixation  Length 

Average  length  of  all  fixations  in  a 
trial 

F(1,4)  =  11.79,  p  <.05 

Fixations  per  Minute 

Number  of  fixations  divided  by  Trial 
Time 

F(1,4)  =  7.01  p  >  .05 

3  Value 

Frequency  response  of  Scan  Path 

F(1,4)  =  2.28,  p>  .05 

Cross  Recurrence 
(Piece  vs.  Position! 

Percentage  of  Recurring  States 

F(1,4)  =  3.42  p  >  .05 

Cross  Determinism 
(Piece  vs.  Position) 

Percentage  of  Recurring  States  that 
Recur  in  an  order 

F(1,4)  =  0.71  p  >  .05 

Diagonal  Recurrence  (Piece  vs. 
Position) 

Percentage  of  recurring  states  that 
recur  at  the  same  point  in  time 

F(1,4)  =  23.34,  p<.05 
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Table  8.  Summary  of  model  fits  for  the  hypothesized  effects  in  Experiment  2. 


DV 

Linear 

Exponential 

Power  R^ 

Completion  time 

.67 

.82 

.97 

Average  fixation 
length 

.25 

.26 

.51 

Diagonal 

recurrence 

.67 

.59 

.81 

P  Value 

.57 

.58 

.59 

For  conventional  eye  gaze  metrics,  it  was  expected  that  there  would  be  a  signifieant 
relationship  between  Trial  and  Average  Fixation  Length  in  Experiment  2.  This  expeetation  was 
based  on  the  signifieant  two  way  interaction  (Practice  x  Counterbalanee)  for  Average  Fixation 
Length  that  was  observed  in  Experiment  1 .  This  expeetation  was  supported:  Average  Eixation 
Length  inereased  over  the  first  5  trials  and  then  seemed  to  level  off  at  about  200  ms  in  the  final  4 
trials  as  shown  in  Eigure  18.  There  was  a  signifieant  effect  of  Trial  for  Average  Eixation  Length, 
as  shown  in  Table  7.  When  eompared  to  Completion  Time,  as  trial  length  deereased  the  length 
of  the  fixations  inereased.  When  fit  with  regression  models.  Average  Eixation  Length  (Eigure  18; 
Table  8)  shows  a  moderate  power  law  relationship.  Taken  together,  this  indieates  that  while 
Average  Eixation  Length  changes  over  the  course  of  9  trials,  it  is  not  necessarily  ehanging 
systematieally  with  Completion  Time. 
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Figure  18.  Average  Fixation  Length  (Y-Axis)  by  Trial  (X-Axis).  Average  fixation  length  increased  as  a 
function  of  trial.  Error  bars  represent  +/- 1  standard  error.  Three  model  fits  were  tested:  linear  (grey  line), 
exponential  (blue  line),  and  power  (red  line). 


P  values  were  tested  for  change  as  a  function  of  Trial,  in  an  attempt  to  clarify 
relationships  observed  in  Experiment  1 .  There  was  not  a  significant  change  in  P  values  with 
learning,  as  shown  in  Table  7.  Figure  19  depicts  the  absolute  value  of  P  values  across  9  trials. 
Absolute  values  are  plotted  (rather  than  the  original  negative  slope  values)  in  order  to  model  the 
data  (power  law  fit  cannot  be  computed  for  negative  values).  As  shown  in  Figure  19,  P  values 
are  generally  flat  with  an  absolute  mean  value  across  trials  of  1 . 14  (signed  value  is  -1 . 14).  P 
values  in  this  range  are  representative  of  Hf  noise,  suggesting  that  ‘optimum’  dynamic  structure 
is  present  in  the  scan  path,  but  this  measure  of  structure  does  not  change  as  a  function  of  learning 
in  this  task.  Regression  fits  for  p  values  (Figure  19;  Table  8)  show  that  all  models  fit  the  data 
moderately  well  (e.g.,  --  .57  R  with  no  distinctions  among  the  three).  Overall,  this  suggests  that 
P  Values  are  not  diagnostic  in  terms  of  learning  or  strategy  for  this  task. 
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Figure  19.  Absolute  p  values  (Y-Axis)  by  Trial  (X-Axis),  p  values  did  uot  cbauge  across  Trials.  Error  bars 
represeut  +/- 1  staudard  error.  Three  model  fits  were  tested:  liuear  (grey  liue),  expoueutial  (blue  liue),  aud 
power  (red  liue). 


There  was  a  significant  effect  of  Trial  on  Diagonal  Recurrence  Profile,  as  shown  in  Table 
7.  Diagonal  Recurrence  increased  from  18.3%  on  Trial  1  to  42.24%  on  Trial  9,  as  shown  in 
Figure  20.  Note  that  Diagonal  Recurrence  was  computed  at  a  time  lag  of  zero;  higher  values  of 
diagonal  recurrence  are  indicative  that  participants  are  fixating  on  a  higher  percentage  of  puzzle 
pieces  that  are  in  the  correct  positions.  Regression  models  (Figure  20;  Table  8)  indicate  that 
Diagonal  Recurrence  is  best  fit  by  a  power  law,  similar  to  Completion  Time.  This  is  further 
evidence  of  learning;  specifically  attunement  to  the  piece/position  constraints  of  an  image  which 
resulted  in  a  more  efficient  search  strategy. 
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Figure  20.  Percent  Diagonal  Recurrence  (Y-Axis)  by  Trial  (X-Axis).  Diagonal  Recurrence  increases  with 
repeated  puzzle  presentations.  Error  bars  represent  +/- 1  standard  error.  Three  model  fits  were  tested:  linear 
(grey  line),  exponential  (blue  line),  and  power  (red  line). 


The  results  for  the  analyses  of  variance  in  Experiment  2  indicate  that  there  was  a 
significant  drop  in  Completion  Time,  and  that  there  were  significant  effects  of  Trial  for  two  of 
the  eye  gaze  metrics  (Average  Fixation  Length,  Diagonal  Recurrence).  The  model  fits  give  some 
insight  to  relationships  between  the  gaze  measures  and  Completion  Time.  However,  to  further 
quantify  the  relationships  between  eye  gaze  metrics  and  Completion  Time,  a  correlation  analysis 
was  performed. 

The  repeated  measures  design  means  that  an  omnibus  correlation  analysis  (all 
participants  and  all  trials  in  the  same  test)  would  be  inappropriate.  To  estimate  the  correlation 
across  participants,  correlations  were  computed  for  each  participant  and  averaged  in  accordance 
with  the  procedures  provided  in  Silver  and  Dunlap  (1987).  Briefly,  for  each  participant,  a 
Pearson’s  correlation  between  all  eye  gaze  metrics  and  completion  time  was  computed  across 
trials.  The  computed  r  values  were  converted  to  Fisher’s  z  values  and  averaged  across 
participants.  The  averaged  z  scores  were  then  re-converted  to  Pearson  r  values  and  tested  for 
significance.  This  procedure  is  necessary  due  to  the  low  sample  size  for  Experiment  2,  and  bias 
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in  the  r  statistic  present  at  higher  values  that  make  it  unsuitable  to  average  the  raw  scores  (Silver 
&  Dunlap,  1987).  The  average  r  values  can  be  seen  in  Table  9. 


Given  the  low  number  of  subjects  for  Experiment  2,  an  alpha  of  .1  was  used  for 
significance  testing  of  correlations.  At  the  .1  level,  Diagonal  Recurrence  had  a  strong  negative 
correlation  with  performance  time,  r  (3)  =  -.85,  p  <  .10.  The  correlation  results  (Table  9)  along 
with  the  model  fits  (e.g..  Figure  20,  Table  8)  indicate  that  Diagonal  Recurrence  had  the  strongest 
relationship  with  Completion  Time.  Furthermore,  Diagonal  Recurrence  provides  complimentary 
information  above  and  beyond  other  metrics:  Specifically,  better  puzzle  performance  (lower 
Completion  Time)  is  seen  when  search  behavior  is  more  efficient  (higher  Diagonal  Recurrence). 


Table  9.  Average  correlation  coefficients  for  the  dependent  variables  tested  in  Experiment  2. 


DV 

Completion 

time 

P 

value 

Diagonal 

recurrence 

Cross 

recurrence 

Cross 

determinism 

Average 

fixation 

length 

Fixations 

per 

minute 

Completion 

time 

— 

-0.27 

-0.85* 

0.31 

-0.07 

-0.66 

0.45 

P  value 

— 

0.26 

0.0 

-0.05 

0.2 

-0.2 

Diagonal 

recurrence 

— 

-0.08 

0.12 

0.62 

-0.46 

Cross 

recurrence 

— 

0.4 

-0.07 

0.14 

Cross 

determinism 


0.15  0.33 


Average 

fixation 

— 

-0.61 

length 

Fixations 

per  minute 

Note:  *p<.1,  critical  r  =  .805 


No  other  correlations  between  eye  gaze  metrics  or  completion  time  were  significant  at  the 
.1  level.  Although  there  was  a  significant  result  in  the  ANOVA,  the  moderate  correlation 
between  Average  Fixation  Fength  and  Completion  Time  was  not  significant.  These  results 
should  include  the  caveat  that  because  of  the  small  sample  size  in  Experiment  2,  this  correlation 
might  reach  statistical  significance  with  a  larger  sample.  At  the  outset  of  Experiment  2,  P  was 
hypothesized  to  be  related  to  Completion  Time.  However,  based  on  the  outcome  of  the  ANOVA 
as  well  as  the  regression  model  fits,  it  is  not  surprising  that  P  values  are  uncorrelated  with 
Completion  Time.  This  suggests  that  while  there  are  Hf  dynamics  exhibited  in  the  scan  path  for 
this  task,  those  dynamics  are  relatively  stable  and  do  not  change,  even  as  structure  increases  for 
fixations,  specifically  Diagonal  Recurrence.  This  could  be  interpreted  as  anchoring  and 
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efficiency;  much  like  a  traditional  puzzle  in  which  one  seeks  out  the  important  pieces  for  the 
puzzle  (in  a  typical  puzzle  the  “edge”  pieces),  in  this  task  participants  were  likely  seeking 
distinctive  pieces  of  the  puzzle.  For  early  trials,  these  pieces  are  not  in  the  correct  positions,  but 
still  provide  an  anchor  from  which  to  seek  other  matching  pieces  (e.g.,  structure  in  the  scan 
path).  With  multiple  iterations  of  the  puzzle,  learning  takes  place.  The  overall  strategy  is  the 
same  (seeking  anchors)  but  with  learning  more  of  the  pieces  are  placed  in  the  correct  positions 
earlier  in  the  trial. 

5.0  GENERAL  DISCUSSION 

The  present  work  was  undertaken  to  explore  the  possibility  of  eye  gaze  as  a  primary 
measure  for  state  assessment  by  using  alternative  indices  of  dynamic  structure.  It  was  expected 
that  eye  gaze  would  be  related  to  performance,  but  at  the  outset,  it  was  not  known  the  direction 
of  the  corresponding  shift  that  might  be  seen  in  the  dynamic  patterns  of  eye  gaze.  Also  of 
interest  was  the  degree  to  which  measures  of  dynamic  structure  would  correspond  to  more 
conventional  measures  of  eye  gaze.  Although  the  general  expectations  were  addressed 
previously,  further  interpretation  of  the  results  will  be  organized  around  the  general  effects  of 
Task  Demands  and  Learning,  along  with  general  conclusions  and  future  directions. 

5,1  Task  Demands  &  Gaze  Patterns 

At  the  outset  of  Experiment  1,  a  primary  question  of  interest  was  the  degree  to  which 
changes  in  the  difficulty  of  the  task,  (standard  vs.  rotated  puzzles)  would  influence  performance 
outcomes,  and  if  corresponding  changes  would  also  be  reflected  in  gaze  patterns.  The 
expectation  for  performance  changes  was  supported  by  the  data  as  complex  puzzles  took  longer 
to  complete  than  standard  puzzles.  Essentially,  the  information  (degrees  of  freedom)  for  each 
piece  was  increased  when  some  of  the  pieces  were  rotated  in  the  complex  puzzle  condition  and 
this  is  reflected  in  the  increased  performance  time.  This  result  was  not  surprising;  however  it 
was  an  important  manipulation  check. 

It  was  expected  that  changes  in  puzzle  type  would  influence  eye  gaze  metrics;  and  the 
expectation  that  eye  gaze  would  be  sensitive  to  difficulty  changes  in  this  task  was  supported.  A 
second  question  concerned  any  potential  differences  between  conventional  and  dynamic 
measures.  There  was  no  distinction  between  conventional  and  dynamic  measures  in  regards  to 
task  difficulty;  both  types  showed  significant  effects.  Average  fixation  length  was  higher  in 
complex  puzzles,  likely  due  to  the  need  to  fixate  longer  while  pieces  are  rotated  to  their  correct 
orientations.  Higher  levels  of  determinism  in  complex  puzzles  could  indicate  an  anchoring 
strategy,  as  previously  mentioned. 

The  expectation  that  spare  capacity  of  the  participants  would  alter  gaze  patterns  was  not 
supported.  When  completing  the  secondary  audio  task,  participants’  task  performance  and  gaze 
patterns  did  not  change  in  a  measureable  way.  There  was  a  generally  detectable  difference 
between  levels  of  Task  Demands  for  Completion  Time  and  Average  Eixation  Length,  but  no 
interactions  or  differences  when  compared  to  puzzles  of  the  same  type  from  trials  1-4. 
Comparison  of  matched  puzzle  conditions  with  and  without  the  secondary  task  showed  no 
difference  in  performance;  and  gaze  patterns  showed  similar  effects  to  trials  1-4. 
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It  may  be  the  ease  that  the  type  of  task,  as  well  as  the  manipulation  of  task  demands 
implemented  in  Experiment  1  were  not  robust  enough  to  alter  gaze  patterns  in  a  way  that 
dynamic  measures  would  be  differentially  sensitive.  The  literature  regards  Hf  as  a  relatively 
stable  phenomenon;  deviations  occur  when  systems  are  in  a  state  of  pathology  or  other 
significant  duress  that  deviations  are  seen  (Bassingthwaighte,  1994).  Although  there  was  not  an 
expected  distinction  between  conventional  and  dynamic  measures  of  gaze,  there  was  support  for 
the  idea  that  gaze  patterns  reflect  changes  in  Task  Demands.  The  current  results  lend  support  to 
the  use  of  eye  gaze  as  a  measure  of  task  difficulty  for  the  purposes  of  state  assessment  in  the  task 
used.  However,  eye  gaze  also  appeared  to  be  related  to  a  different  aspect  of  performance, 
specifically  learning  and  strategy. 

5,2  Learning  &  Gaze  Patterns 

There  was  support  for  the  idea  that  gaze  patterns  would  change  as  a  result  of  learning. 
Learning  effects  were  more  nuanced  than  the  results  for  Task  Demands.  In  Experiment  1,  there 
were  significant  interactions  of  Average  Eixation  Length  and  P  values  involving  the 
counterbalance  in  the  first  experiment  that  are  difficult  to  interpret,  other  than  suggesting  that 
there  was  a  transfer  of  gaze  strategy  that  was  different  depending  on  the  order  of  puzzle  type; 
and  that  trials  2  &  3  (repeating  puzzle  types)  show  different  relationships  than  trials  1  &  4 
(separated  presentations  of  the  same  puzzle  type).  Addressing  this  issue  was  a  primary 
motivation  for  Experiment  2,  which  showed  a  clear  performance  improvement  as  participants 
learned  the  particular  aspects  of  each  image.  Experiment  2  also  provided  insight  into  which  gaze 
measures  were  sensitive  to  learning  effects. 

Average  Eixation  Length  had  significant  relationships  with  trial  in  both  experiments; 
however  the  data  from  Experiment  2  suggest  that  over  time  an  increase  in  the  average  fixation 
length  occurs.  There  are  multiple  reasons  why  this  could  be  the  case;  it  is  difficult  to 
discriminate  with  the  present  results.  In  Experiment  2  all  trials  included  complex  puzzles;  the 
increase  in  fixation  time  could  be  the  result  of  more  time  spent  studying  individual  pieces.  It 
could  also  be  the  result  of  learning  the  general  features  of  individual  pieces  and  making  one 
fixation  that  allowed  participants  to  “see”  multiple  pieces  (i.e.  attend  to  different  areas  within  the 
visual  field;  Heinen  et  ah,  2011). 

Data  from  Experiment  2  suggest  that  P  values  did  not  change  significantly  with  learning; 
however  they  were  in  the  range  of  Mf  phenomena.  This  suggests  that  the  scan  path  within  each 
trial  is  characterized  by  a  relatively  stable  power  law  (as  stated  previously  Hf  frequency 
responses  are  indicative  of  power  law  relationships).  As  previously  stated,  power  law 
relationships  are  representative  of  tuning  or  refining  existing  learning,  rather  than  learning  new 
skills  (Crites  and  Gorman,  2013).  It’s  easy  to  see  why  visual  search  would  fit  these  criteria;  from 
early  ages  we  are  searching  for  objects  in  the  environment,  and  the  present  task  is  a  different  spin 
on  visual  search.  The  power  law  finding  is  consistent  with  research  from  Aks  (2011),  who 
determined  Mf  patterns  were  present  in  visual  search.  Both  Aks  et  al.  (201 1)  and  Stephen  and 
Anastas  (2011)  interpret  Hf  patterns  as  efficient  search.  The  current  data  supports  this  idea;  but 
provides  further  evidence  via  the  cross  recurrence  based  measure  of  diagonal  recurrence. 
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In  Experiment  1 ,  a  main  effeet  of  trial  was  observed  for  Diagonal  Reeurrence,  which  was 
higher  for  the  second  presentation  of  a  puzzle.  This  was  interpreted  as  learning  a  more  efficient 
search  strategy.  This  is  because  Diagonal  Recurrence  represents  a  specific  type  of  structure  in 
the  pattern  of  fixations,  specifically  more  fixations  upon  puzzle  pieces  in  their  correct  positions. 
Note  that  for  Experiment  1,  the  images  seen  in  presentations  1  and  2  were  counterbalanced; 
suggesting  that  participants’  strategy  shift  is  not  due  to  properties  of  a  particular  image.  This 
suggests  that  gaze  strategy  as  measured  by  diagonal  recurrence  may  precede  performance 
changes  in  some  cases. 

When  repeating  puzzles  containing  the  same  image,  as  in  Experiment  2,  the  learning 
effect  becomes  more  pronounced  in  gaze  patterns,  specifically  those  patterns  measured  by 
Diagonal  Recurrence.  As  properties  of  a  specific  image  become  apparent,  a  more  efficient  gaze 
strategy  in  which  participants  anchor  their  search  on  pieces  in  the  correct  positions  results. 
Although  there  were  significant  effects  for  both  conventional  and  dynamic  measures  of  eye  gaze. 
Experiment  2  has  limited  support  for  the  idea  that  dynamic  measures  are  more  sensitive  to 
changes  in  performance  due  to  learning  or  strategy,  since  Diagonal  Recurrence  had  the  highest 
correlation  with  performance. 

5,3  General  Conclusions  &  Future  Directions 

Diagonal  Recurrence  was  likely  related  to  better  task  performance  by  learning  a  more 
efficient  search  strategy.  If  this  is  the  case,  then  differences  in  Diagonal  Recurrence  should  be 
seen  between  participants  who  did  and  did  not  solve  a  puzzle.  A  subset  of  the  data  from 
Experiment  1 ,  specifically  the  2"‘*  presentation  of  the  complex  puzzle,  was  selected  as  a  test  of 
this  idea.  Erom  this  subset,  13  participants  solved  the  puzzle,  18  did  not.  Three  eye  gaze  metrics 
were  tested:  Diagonal  Recurrence,  P  values,  and  Average  Eixation  Eength.  The  results  of  this 
analysis  are  presented  in.  In  this  instance,  there  is  a  distinction  between  conventional  and 
dynamic  measures.  Diagonal  Recurrence  is  lower  for  the  group  that  did  not  solve  the  puzzle,  and 
higher  for  the  group  that  was  successful.  P  values  are  closer  to  1  for  the  group  that  solved  the 
puzzle  and  slightly  higher  for  the  group  that  did  not  solve  the  puzzle.  However,  Average 
Eixation  Length  is  unchanged  between  the  two  groups. 
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Table  10.  Summary  of  results  for  a  subset  of  data  from  tbe  first  experimeut,  split  by  successful  puzzle 

completiou 


Dependent  Variable 

Mean  (SD)  for 
Completed  Puzzles  [n 
=  13] 

Mean  (SD)  for 
Incomplete  Puzzles  [n 
=  18] 

F  values 

Diagonal  Recurrence 
(Percent) 

29.2  (11.2) 

19.2  (14.2) 

F(1,29)  =  4.512  p<  .05 

P  Value  (unit  less) 

-1.25  (.12) 

-1.34  (.08) 

F(1,29)  =  5.887  p  <  .05 

Average  Fixation 

Length  (milliseconds) 

196.5  (14.3) 

196.2  (12.8) 

F(1,29)  =  .003  p>.05 

Stephen  and  Anastas  (2011)  suggested  that  Mf  structure  in  eye  gaze  would  be  indicative 
of  better  performance  in  visual  search  tasks.  There  is  some  support  for  this  idea,  based  on  the 
performance  split;  participants  that  solved  the  puzzle  exhibited  patterns  in  their  scan  paths  that 
are  closer  to  \lf,  whereas  participants  who  didn’t  solve  the  puzzle  show  a  slightly  more 
structured  scan  path.  However,  the  overall  results  suggest  that  Hf  was  a  general  property  of  the 
scan  path  in  this  experiment,  rather  than  diagnostic  to  performance. 

Mf  structure  was  generally  present  in  the  scan  path;  and  is  thought  to  be  ‘meta  stable’ 
because  it  represents  flexible  or  adaptable  organization  in  the  underlying  systems,  without 
exhibiting  too  much  randomness  (e.g.,  Holden  et  al,  2009).  Note  that  the  methodology  used  here 
performs  the  frequency  analysis  on  the  angular  displacement  within  the  measured  scan  path  (i.e. 
the  macrostructure  of  eye  gaze),  and  the  recurrence  analysis  represents  a  subset  of  that  scan  path, 
fixations  (e.g.,  part  of  the  microstructure  of  eye  gaze).  This  discrepancy  may  account  for  the 
results  here.  The  macrostructure  shows  dynamic  stability  (e.g.,  1//),  aspects  of  the  microstructure 
were  “re-organized”  (e.g.,  fixation  patterns  change).  Only  by  using  both  types  of  dynamic 
measures  was  the  distinction  observed. 

The  distinction  can  be  seen  when  looking  at  two  cross  recurrence  matrices  for  the  same 
participant  in  Experiment  2.  Figure  21  shows  the  cross  recurrence  matrix  for  a  subset  of  data 
from  the  initial  stages  of  trial  1  (the  first  600  fixations).  Figure  22  shows  the  cross  recurrence 
matrix  for  all  of  the  data  from  trial  9  (approximately  600  fixations).  However,  there  is  a  clear 
distinction  in  the  two  based  on  the  levels  of  Diagonal  Recurrence.  Diagonal  recurrence  is  around 
1%  early  in  trial  1  and  around  45%  for  trial  9.  Note  that  for  both  of  these  trials,  the  overall  scan 
path  was  classified  as  Hf,  suggesting  that  there  is  a  great  deal  of  flexibility  in  how  Mf  variability 
can  appear  in  the  scan  path. 
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Position 


Figure  21.  Puzzle  piece  (Y-Axis)  by  positiou  (X-Axis)  Cross  Recurreuce  matrix  for  oue  participaut  iu  the  first 
learuiug  trial.  Shaded  grey  areas  represeut  matchiug  values  betweeu  the  two  series  (Recurreuce).  Liue 
structures,  represeut  matchiug  values  iu  au  order  (Determiuism).  Diagoual  Recurreuce  would  appear  as  a 
liue  structure  aloug  the  diagoual.  This  plot  shows  low  levels  of  determiuism  aud  diagoual  recurreuce  which 
iudicates  low  coupliug  betweeu  puzzle  piece  aud  positiou,  iudicatiug  that  the  participaut  has  uot  learued 
about  the  piece/positiou  relatiouships  yet. 
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Figure  22.  Puzzle  piece  (Y-Axis)  by  Positiou  (X-Axis)  Cross  Recurreuce  matrix  for  oue  participaut  iu  the  fioal 
learuiug  trial.  Shaded  grey  areas  represeut  matchiug  values  betweeu  the  two  series  (Recurreuce).  Liue 
structures  represeut  matchiug  values  iu  au  order  (Determiuism).  Diagoual  Recurreuce  appears  as  a  liue 
structure  aloug  the  diagoual.  The  high  level  of  diagoual  recurreuce  preseuted  iu  this  figure  iudicates  high 
coupliug  betweeu  puzzle  piece  aud  positiou,  iuterpreted  as  a  more  efficieot  gaze  strategy  with  practice. 


Overall,  in  terms  of  state  assessment,  there  is  evidenee  that  eye  gaze  is  not  only  related  to 
task  diffieulty,  but  also  to  learning  or  strategy.  The  results  of  these  experiments  suggest 
partieipants  are  learning  about  the  relevant  degrees  of  freedom  and  the  overall  eonstraint(s)  for 
eompleting  the  puzzle  (e.g.,  the  pieee/position  relationships  within  the  image).  That  is, 
partieipants  are  tuning  to  the  relevant  eonstraints  of  the  task,  and  beeoming  more  effieient  in 
their  gaze  patterns  as  a  result.  In  this  ease,  the  ehange  in  dynamie  strueture  is  uni-direetional; 
higher  diagonal  reeurrenee  is  optimal  in  this  task  beeause  it  measures  the  sole  eonstraint  needed 
to  eomplete  the  puzzle  (pieees  in  the  eorreet  position).  In  more  eomplex  tasks  (i.e.  more 
eonstraints  on  performanee)  and  partieularly  novel  tasks  (e.g.,  novel  skill  vs.  existing  skills; 
Crites  and  Gorman,  2013),  it  is  unlikely  that  the  results  would  follow  the  same  pattern. 

The  eonelusions  about  learning  and  the  eorrelations  between  gaze  metries  should  be 
further  explored,  by  eolleeting  data  from  a  larger  sample  of  partieipants  for  Experiment  2. 
Experiment  2  was  eondueted  as  a  follow  up  in  order  to  elarify  effeets  of  praetiee  that  were  seen 
in  Experiment  1 .  While  the  small  sample  helped  to  make  sense  of  these  results,  a  larger  sample 
would  be  more  statistieally  robust,  and  further  trends  may  be  seen  (e.g.,  a  more  rigorous 
statistieal  analyses  of  eorrelational  relationships  between  gaze  metries).  This  would  allow  for 
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inferences  about  shared  relationships  between  variables  used  in  the  present  work  that  show 
correlations  with  each  other.  For  example,  Average  Fixation  Length  and  Diagonal  Recurrence 
show  a  moderate  correlation  (r  =  .62)  that  might  approach  significance  with  a  larger  sample. 
These  relationships  could  also  be  further  explored  via  a  hierarchical  regression  analyses  to 
determine  the  overall  contribution  of  each  gaze  metric  to  performance  outcomes. 

The  present  studies  limited  the  analyses  to  the  performance  outcome  of  completion  time 
and  the  different  eye  gaze  metrics.  From  these  analyses,  interpretations  about  strategy  were 
made.  The  addition  of  puzzle  piece  selection  and  manipulation  actions  of  the  participant  could 
provide  further  insights  into  operator  state.  With  this  data  determinations  of  the  specific 
movement  sequences  could  be  assessed.  Furthermore,  the  series  of  actions  could  be  crossed  with 
eye  gaze  data,  via  a  cross  recurrence  analysis,  in  a  similar  way  as  the  piece/position  cross  was 
implemented  here.  This  could  give  insight  into  the  coordination  of  a  gaze  and  action, 
specifically  the  degree  of  coupling  between  participants’  eye  gaze  and  puzzle  manipulation 
strategies.  For  example,  one  potential  outcome  of  these  analyses  would  be  the  lead/lag 
relationships  between  eye  gaze  and  action. 

Although  eye  gaze  was  singled  out  in  the  present  study,  eye  gaze  is  only  one  of  the 
potential  primary  task  measures  that  could  be  available  in  an  operational  setting.  Future  work 
could  utilize  dynamic  methods  for  additional  primary  measures.  For  example,  communication 
patterns  are  one  area  which  has  been  shown  to  reveal  dynamics  of  team  coordination  (Russell  et 
ah,  2012).  Holden  et  al.’s  (2009;  2011)  work  on  reaction  time  intervals  could  also  be  applied  to 
more  general  aspect  of  operational  activities  (intervals  between  required  actions).  Furthermore, 
variability  in  control  mechanisms  (e.g.,  button  presses,  flight  stick  movement)  may  provide 
another  signal  from  which  to  assess  operator  state  using  dynamic  measures  (Strang  et  ah,  2013). 

The  current  project  was  undertaken  with  the  goal  of  determining  if  dynamic  patterns  of 
variability  in  eye  gaze  reflect  underlying  properties  of  an  operator.  Initially  the  focus  was  on 
workload  of  the  operator,  and  this  project  demonstrated  the  general  sensitivity  of  eye  gaze  to 
workload  effects.  Also  demonstrated  here  was  the  relationship  of  dynamic  structure  to  learning 
or  strategy  shifts.  Support  for  this  idea  was  confirmed  for  effects  of  learning  across  trials,  with 
some  limited  support  for  the  idea  that  dynamic  measures  were  more  sensitive  than  conventional 
measures  in  regards  to  these  learning  effects.  This  is  not  meant  to  be  an  indictment  of  average 
based  measures,  rather  to  stress  that  not  all  variability  is  error;  dynamic  analyses  may  provide  a 
richer  understanding  of  underlying  states  of  the  operator,  but  are  not  necessarily  superior  to 
conventional  measures.  While  measures  of  dynamic  structure  may  be  conceptually  different 
from  conventional  averages,  computationally  they  require  little  extra  effort  to  compute.  Moving 
forward,  both  should  be  applied  (where  appropriate)  to  utilize  the  complimentary  explanatory 
powers  in  making  sense  of  human  performance  data  in  complex  tasks. 
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